Simple Workload Failover Example

Document ID : KB000056323
Last Modified Date : 14/02/2018
Show Technical Document Details

Environment:

1 x TNG Server : Running CORe, DSM, Enterprise Management - TNG Server in diagram2 x TNG Enterprise Management (Including the Workload Managers) - WLM-1, WLM-2 in diagram2 x Workload Agent systems - WLA-1, WLA-2 in diagram Both WLM-1 and WLM-2 have NT System agents reporting back to the main TNG server. On WLM-1, all the scheduling is set up as it would normally be done. On WLM-2, an exact copy of the scheduling database is set up but the jobsets are not autoselected. If a trap is received at the main TNG Server indicating that WLM-1 has gone down, then a series of cautil commands is issued to WLM-2 to activate the jobsets , demand them in , load the triggers etc. As soon as a trap is received from WLM-1 reporting that is again active, then the process is reversed. (A similar setup is also done for WLM-2). The failover is driven by the Enterprise Management Event Manager using MsgRecords and MsgActions. Below the visual example is a sample of the MsgRecords and MsgActions. Please not that you will have to modify the items prior to using on your own system.

Message Records and Actions for Simple Workload Failover Example:

msgrec
msgid="fred is dead"
type="MSG"
msgnode="*"
desc="Message to say that the primary workload server is down- from
TNG server"
cont='N'
msgact='Y'
wcsingle='?'
wcmany='*'
case="y"
regexp="n"
define msgact
name=(*,10)
action="HILITE"
attrib="DEFAULT"
color="GREEN"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="Primary Workload server is down- Taking over its duties"
define msgact
name=(*,20)
action="COMMAND"
attrib="DEFAULT"
color="DEFAULT"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="cautil alter jobset id=Graham autos=y"
define msgact
name=(*,30)
action="COMMAND"
attrib="DEFAULT"
color="DEFAULT"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="cautil demand jobset id=Graham"



define msgrec
msgid="fred is alive"
type="MSG"
msgnode="*"
desc="message from the TNG server to say that the primary workload
server is back"
cont='N'
msgact='Y'
wcsingle='?'
wcmany='*'
case="y"
regexp="n"
define msgact
name=(*,10)
action="HILITE"
attrib="DEFAULT"
color="GREEN"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="fred has recovered and this workload server is returning to
normal duties"
define msgact
name=(*,20)
action="COMMAND"
attrib="DEFAULT"
color="DEFAULT"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="cautil alter jobset id=Graham autos=n"
define msgact
name=(*,30)
action="COMMAND"
attrib="DEFAULT"
color="DEFAULT"
condop="  "
evaluate='Y'
quiet='N'
status="ACTIVE"
sim='N'
text="cautil cancel tjobset id=Graham purge=yes"