Challenges when CA Workload Automation ESP system is down for many hours

Document ID : KB000016547
Last Modified Date : 14/02/2018
Show Technical Document Details
Introduction:

ESP will not be started for 24 to 48 hours later on DR site. 

Question:

What are the pitfalls to face when ESP attempts to schedule 48 hours of missed processing?

Environment:
CA WA ESP 11.4 Z/OS
Answer:

- Events scheduled or triggered into future from the disaster time till next scan time will be executed at once after ESP is up;

- Correct processing depends also on application logic (job that should run on Monday should have RUN MONDAY coded and not RUN ANY and be dependent on the event being scheduled on Monday);

- Active workload at the disaster time, i.e. submitted or executing, will be marked as SYSTEM ERROR; 

- CPU spike is expected when ESP is brought up and the event initiators will be in use until the backlog of missed events has cleared up, which might delay the workload scheduled soon after ESP is up;

- Operational datasets might get problem with space or high utilization might be detected;

- CKPT, APPLFILE/TRAKFILE, COMMQ, HISTFILE, QUEUE (JOBINDEX and JOBSTATS might be affected as well, if new job names were introduced);

- TCELL & DSTCELL buffer overflow, which might lead to loss of tracking or data set trigger data. It depends on the amount of workload being tracked at the same time and the settings related to TCELL and DSTRIG parameters in ESPPARM; 

- Variables in active and new applications that are not based on ESPS variables can be set with wrong values; 

- Variables and resources can be set with wrong values if the related commands are processed out of sequence; 

- If the file (like checkpoint, queue etc) is filled, then ESP may abend and can’t stay up or need to reformat the related files first.