Jobs showed as "aborted" after launch, while they are still running on remote node.

Document ID : KB000086191
Last Modified Date : 14/04/2018
Show Technical Document Details
Issue:
Error Message :
In UVC : 
Long time "pending" jobs (30 to 120 sec).
Jobs showed as "aborted" short after launch, while they are still running on remote node.

In universe.log for node handling logical queue (example, timestamps, node ID and node name can differ) : 
##################
| 2013-01-17 12:17:03 |ERROR|X|IO |pid=2200.4272| o_io_cache_data_provider_ | Error getting Node <NBK971BTE002>: 600
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| o_connect_auth | o_io_api_getserv error: unable to get service [SIO]/[X]
| 2013-01-17 12:17:03 |ERROR|X|IO |pid=2200.4272| k_trt_req_network | Network request [G] returns -1 [] error code [0] error msg []
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| u_io_callsrv_connect_r | Error connecting to target IO server: ()
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| o_ext_read_name | Unable to connect to IO X of nodeID [N000000014]
##################

Patch level detected:Dollar Universe 6.0.00
Product Version: Dollar.Universe 6.0.0

Description :Using DQM with logical queue linked to several remote physical queues :
If a remote node is not available, DQM handling logical queue will endlessly try to call unavailable node, will keep jobs as pending for a long time before sending it to available physical queue, and will be unable to offer correct job monitoring (jobs can appear as aborted while they are still running on remote node).
Environment:
OS: All
OS Version: any
Cause:
Cause type:
Defect
Root Cause: N/A
Resolution:
Make sure that all remote node having a physical queue linked to the logical queue are available.
If one is not available, unset its physical queue from logical queue settings.

Fix Status: Released

Fix Version(s):
Component: Application.Server
Version: Dollar.Universe 6.0.0
Additional Information:
Workaround :
N/A