All jobs are in Time overrun, Event or Launch wait status

Document ID : KB000085942
Last Modified Date : 30/04/2018
Show Technical Document Details
Issue:
Affects Release version(s): 5

Error Message :
The universe.log file contains the following types of errors:

####################################
<< 2015-03-03 23:34:17 0023982/uxioserv /io_error /000000000 - /orsyp/universe/exp/data/u_fmcx50.dta
<< 2015-03-03 23:34:17 0023982/uxioserv /u_maj_fichier_th /000000010 - ERROR : cannot write in file /orsyp/universe/exp/data/u_fmcx50.dta
<< 2015-03-03 23:34:17 0023982/uxioserv /io_error /000000000 - /orsyp/universe/exp/data/u_fmhs50.dta
<< 2015-03-03 23:34:17 0023982/uxioserv /u_maj_fichier_th /000000010 - ERROR : cannot write in file /orsyp/universe/exp/data/u_fmhs50.dta
<< 2015-03-03 23:34:18 0023982/uxioserv /io_error /000000000 - /orsyp/universe/exp/data/u_fmph50.dta
<< 2015-03-03 23:34:18 0023982/uxioserv /u_maj_fichier_th /000000010 - ERROR : cannot write in file /orsyp/universe/exp/data/u_fmph50.dta
<< 2015-03-03 23:34:23 0023982/uxioserv /io_error /000000000 - /orsyp/universe/exp/data/u_fmfu50.dta
<< 2015-03-03 23:34:23 0023982/uxioserv /u_maj_fichier_th /000000010 - ERROR : cannot write in file /orsyp/universe/exp/data/u_fmfu50.dta

##############

The universe.log file also shows a segmentation fault:

<< 2015-03-03 23:34:58 0023982/uxioserv /UXOS_HdlTermProcess /000000000 - execution handler : SIGNAL = (11) PID = (23982) PPID = (1) GPID = (23982) 

Patch level detected:Dollar Universe 5.6
Product Version: Dollar.Universe 5.6.0 FX25010

Description :No job is running. They are all in Event or Launch wait and Time overrun status.
A newly submitted job goes directly into Launch wait status and the log shows it is waiting in the queue.

IO engine is not running.
Environment:
OS: All Unix
Cause:
Cause type:
Configuration
Root Cause: The "ERROR : cannot write in file" message is usually caused by either incorrect permissions, no access to the file system, or a full disk/file system
Resolution:
Check that the disk is not full and that the correct access has been granted for the data files.
If the data files cannot be written to, the IO engine will stop.

#####################

The core dump / "SIGNAL = (11)" error should have generated a core file in the exec folder, which can then be read by the client, using the "gdb" command, to further narrow down the root cause.

#####################

Make sure to shutdown the node, make an offline reorganization and restart after a verification of the user rights and the file system size.

Fix Status: No Fix
 
Additional Information:
Workaround :
N/A