Tie-breaker Services Keep Crashing after an Unexpected Outage

Document ID : KB000032835
Last Modified Date : 14/02/2018
Show Technical Document Details

Issue:

We had an unexpected outage over the weekend that affected our entire Autosys Production environment. Once the servers rebooted, we were able to start all services on the Primary, Shadow and Tie-breaker servers.  When we run chk_auto_up, the tie-breaker server is not showing up. The services on that server keep crashing.

 

Environment:

Applies to all supported OS environments for the CA Workload Automation AE r11.3.6

 

Cause: 

More than likely, a database rollover has occurred and your environment is operating on a single event server (database) and not dual event servers. To determine if a rollover has occurred, view your config.$AUTOSERV file (or config.%AUTOSERV% file) and look at the Event_Server parameters. If a rollover has occurred, you will see an entry that look like the following:

EventServer_1=database1,1521,localhost.local.com

#AUTO-ROLLOVER#EventServer_2= database2,1521,remotehost.remote.com

                                                                                                                                                        

Resolution:

The purpose of a tie-breaker scheduler is to update the database to confirm that it is still running and it logs changes in the state of high-availability. At all other times, the tie-breaker scheduler runs idle. The primary and shadow schedulers rely on database updates from the tie-breaker scheduler to determine the state of high-availability.

Based upon the messages coming from the tie-breaker scheduler, the tie-breaker knows that Autosys has been configured for dual event servers but it does not detect two databases. The tie-breaker services will stay up and running once both databases are back up, otherwise the tie-breaker has no reason to run.

To get the tie-breaker service up and running again, you need to get both databases up and running.

1. Stop both the primary and shadow schedulers.

2. Remove the ‘#AUTO-ROLLOVER#’ from both the primary and shadow config.$AUTOSERV files (or config.%AUTOSERV% files). 

3. Start the primary scheduler and confirm that both databases are up by running the 'chk_auto_up' command

4. Once you confirm there are two event servers running, start the shadow scheduler and then the tie-breaker scheduler.

 

Additional Information:

Visit the CA Workload Automation AE & Workload Control Center Wiki for more information about configuring and monitoring WAAE HA environments.