For Out-of-Space conditions you should move the data to a larger file as soon as it is convenient. This may be accomplished by allocating a new file and using IDCAMS to REPRO the existing contents to this new file.
The JOBINDEX file should be the size of the TRAKFILE plus APPLFILE with a reasonable cushion. Ideally when allocating new files, QUIESCE ESP and issue a TRACKING NOSTORE and allow all current work to complete before shutting down to copy to new file. Once this is completed, run the Restart ESP and issue a TRACKING STORE.
Review the status of jobs on CSF prior to shutdown.
The CYBESUT1 utility is used to analyze and optionally reclaim slots in the APPLFILE and TRAKFILE. This may be useful when your TRAKFILE is filling up regularly, even though there are sufficient slots allocated. It should only be run against the APPLFILE if you are receiving any error messages indicating an corruption error or insufficient slot condition.
Run the CYBESUT1 utility in analyze mode to determine the total number of slots on the file, and the number of free slots available for use, together with information about any records in error.
When you run the CYBESUT1 utility in ANALYZE mode, it displays statistics concerning the status of all active records. An active record is one which has an active entry in the JOBINDEX file, i.e. a record concerning a job execution which has not yet been marked complete. There are three possible statuses which you may get. Regarding the state of each active record these are:
- passed validity check
- failed validity check
- I/O error
Records with a status of 'failed validity check' will each have a message displayed specifying the type of problem encountered. The significant portions of these messages are:
- STEP CHAIN FAILED VALIDITY CHECK
- INCORRECT PNODE
- JTR FAILED VALIDITY CHECK
- NOT ON TRAKFILE
A small number of jobs whose records are reported as NOT ON TRAKFILE is not unusual, and may safely be ignored in general. What this means is that there is an active record on the JOBINDEX file, but that the TRAKFILE slot to which it points is not the start of a record for the same job name and job number combination. This can happen, for example, if a job is submitted by ESP but isn't tracked for some reason; possible reasons include an error in the JTDT, or a job which is NJE transmitted to another node for execution and the remote node doesn't send its tracking data back to the submitting node. In the latter situation, a JTR and a JOBINDEX record will be created by the job submission, but the JTR's slot(s) will be freed when the job is purged from the submitting system, and eventually the file will wrap and the job's slot(s) will be reused. If the job still has a record on the JOBINDEX file, then CYBESUT1 would issue the JOB NOT ON TRAKFILE message. In the former case, a JTR and JOBINDEX record are again created by the job submission, but no further data about the job (not even a job start record) is received. These slots will never be freed by normal processing; once the JOBINDEX record ages sufficiently to be dropped, all reference to the slots is lost and they become 'orphaned'. If persistent circumstances exist which create this state of affairs on an ongoing basis, these 'orphan' slots will increase in number over time, and will eventually need to be reclaimed by running CYBESUT1 in UPDATE mode.
You should be concerned about any 'failed validity check' conditions which produce messages other than 'JOB NOT ON TRAKFILE', and all 'I/O error' messages.
CYBESUT1 in UPDATE mode will resynchronize the JOBINDEX file with the TRAKFILE. The UPDATE requests that the TRAKFILE be updated with the new bitmap and free record count. If this keyword is omitted, the analysis is performed but no update takes place. Note that when this is done, not only are any orphan slots reclaimed, but also the slots for all genuine jobs which are no longer active. That is, the slots belonging to all completed jobs will be flagged as free. The data in these slots is not modified in any way, and is still accessible until the file 'cursor' wraps around to the appropriate location on the file, at which time they will be reused.
Regardless of whether UPDATE is specified or not, statistics are displayed as to how many slots are currently in use and free, and how many would be free after slot reclaim. In addition, the numbers of active and valid jobs or applications are displayed, along with average number of slots per job or application.
Note - When running CYBESUT1 in UPDATE mode against the TRAKFILE, ESPWM should be brought down. Alternatively you could issue "TRACKING NOSTORE" to quiesce the tracking processor. When CYBESUT1 is complete issue a "TRACKING STORE" to resume processing.
Note - If CYBESUT1 is run when ESPWM is down any error recorded in the report should be treated as an actual error. It is not unusual, however, to see errors when running CYBESUT1 against an active file because the data is transient. If you run CYBESUT1 twice within a 30 minute window and you receive the same error(s), you should treat these errors as genuine. If during this same time frame you notice the errors creeping upwards, this may indicate more serious problems.
If running CYBESUT1 in UPDATE mode does not clean up all the problem slots, the TRAKFILE itself may be corrupt. Your response depends on the number of slots in error, the frequency and the impact of the problems encountered. The ultimate solution is to allocate a new TRAKFILE and then run CYBESUT4 to format the new file and copy the old data.
Note - When running CYBESUT1 in UPDATE mode against the APPLFILE ESPWM must be brought down. When ESP is quiesced, no new work is submitted. ESP continues to track jobs and respond to operator and user commands. When the system is quiesced it is only quiescing EVENT EXECUTION.
This Frequently Asked Question applies to all supported releases of ESP Workload Manager.