Uproc aborts with errror Unable to connect

Document ID : KB000109305
Last Modified Date : 02/08/2018
Show Technical Document Details
Issue:
Uprocs randomly end with status Aborted. The only message found in the Job Log is  "Unable to connect". The usual Uproc header containing the Uproc, MU, variables and other information related to the execution is not printed. 

This issue is likely to happen in a period where there are many Uprocs launched at the same moment. 
Cause:
The program uxjobinit that is called from u_batch is momentarily unable to connect to the IO in time
Resolution:
Workaround:
Increase Node Settings > Technical Settings > "Time-out for IO server (seconds)" to 60
This corresponds to the variable U_IO_TIMEOUT in values.xml.

Since this issue often happens at a particular time where there is a peak charge on the node you can also consider setting a maximum number of parallel job in DQM, this would smooth out the charge on the server and prevent incidents.
Go to Design > Environment > Batch Queues > Update Queue > set "Maximum Job Limit" to 30 (or 40 or 50 depending on the sizing of the node)