Production down recovery: System not responding

Document ID : KB000089695
Last Modified Date : 14/04/2018
Show Technical Document Details
Issue:
Production down recovery: System not responding
Resolution:

Symptoms

  • Login not possible 
  • Jobs aren't running
  • Jobs get stuck in generating.
  • Working Processes not doing anything.
  • Not able to resume production processing


Investigation

Data needed for analysis:

  • Automation Engine logfiles from the time the error occurred. (Default location in \AutomationEngine\Temp)
  • A "DB = 2" trace activated on the Working Process logfiles running for a couple of min. while the error occurs

You can find the following problem in the trace files of the Working Process:

It is always the same statement that causes time critical database calls:

SELECT MQWP.*,ROWID FROM MQWP WHERE MQWP_SchedTime<=? ORDER BY MQWP_Priority, MQWP_SchedTime, MQWP_PK FOR UPDATE SKIP LOCKED

U0003524 ===> Time critical DB call! OPC: 'READ' time: '3:311.795.000'

In this case it took the database over 3 seconds to respond to a simple select statement reading information from the MQWP table,

Our reference value for a select, update, rollback process lies within 470 in a second, you can find these reference values in the beggining of a Working Process Logfile:

U0003533 Check of data source finished: No errors. Performance CPU/DB:
U0003544 Reference values tested with Windows 2003 on XEON 1500 MHz: CPU 813865, DB 470

You are dealing with a select statement, which reference value lies in being able to be performed 470 times a second (in an environment with optimal performance) and suddenly takes more than 3 seconds to be performed once.

This statement is being performed on the MQ (Message queue) Tables, it slows down all processing since it is being performed over and over again.


Cause

  • The select itself is ok (around 0.001 seconds), but the fetch of the first row is very slow.
  • There is something wrong with the database or database settings.
  • Most likely a defect in the index of these tables.


Resolution

Resolution

The database response is very slow when retrieving information from the MQ tables and is causing an enormous delay.

Please contact your database administrator or database vendor to solve these issues and analyze the root cause of the slow response.

Investigate any changes that where made before the System started having these issues.

In some cases these issues have been solved through:

.) Repairing or rebuilding the indexes (Please do not perform without permission of the database administrator)