Occasionally we can see submitted tasks remaining in progress and not completing.
This document describes how tasks can end up being stuck in-progress and how to manage them, taking into account different causes.
1. Restarting in-progress tasks:
Identity Manager provides the capability to restart tasks/events that may have failed or are in a stuck in an in-progress state. The View Submitted Tasks tab can be configured for restarting tasks. For more information on this configuration see the Administration Guide section Customize the View Submitted Tasks Tab located in System Tasks -> The View Submitted Tasks Tab -> Customize the View Submitted Tasks Tab.
2. Identify specific in-progress tasks:
All of the task states are mapped to 'pending' and 'in progress' except for 'completed', 'cancelled' and 'rejected'.
I.e., "In Progress" or "Pending" = TP states 0, 1, 2, 4, 8, 16, 64.
'Failed' tasks do not have a unique state because the tasks can fail during different events within the task. Failed tasks will have the last logged state at the time of the failure which could be one of the pending or in process states.
The 'locked' entries indicate that at some point during the task's processing a required object is in-use (or locked by another process --a parallel workflow process, for example) and are temporarily unavailable.
In-progress (pending) states:
BEGIN_STATE = 0
PRE_STATE = 1
INVALID_STATE = 2
PENDING_STATE = 4
EXECUTING_STATE = 8
APPROVED_STATE = 16
REJECTED_STATE = 32
POST_STATE = 64
COMPLETED_STATE = 128
CANCELLED_STATE = 256
MARK_FOR_DELETION_STATE = 512
UNKNOWN_STATE = 768
INITIAL_STATE = 1024
PRIMARY_PENDING_STATE = 2048 (waiting to be approved)
PRIMARY_COMPLETE_STATE = 4096
SECONDARY_PENDING_STATE = 8192
AUDIT_STATE = 16384
Use the following SQL statement to monitor TP database for In-Progress and Failed tasks:
SELECT * FROM tasksession12_5
WHERE (state NOT IN (128, 512))
Check performance, cleanup old completed rows regularly (see the TP garbage collection clean up stored procedure in the IDM tools directory).
3. Monitor application servers/cluster for messaging errors.
For the JBoss Application Server check/verify the size of the HyperSonic DB(which handles the Messaging Queue by default in JBoss installations) by checking the size of the localDB.data file located under: <JBOSS_HOME>\server\default\data\hypersonic for JBoss 5.1 based app servers. Please be aware this step is redundant with JBoss 6.x based application servers as it uses HornetQ instead.
This file should be no larger than 1024MB. If it is larger it would indicate possible corruption in the messaging queue. To correct this, re-name the existing localDB.data file and restart the JBoss Application Server. Note, it is not recommended to use the HyperSonic DB as the Messaging Queue for a production environment. For additional details see technical document: KB000011201.
For WebSphere Application Servers, check the IMSBus, and if running a WAS cluster verify that it is configured on the Cluster vs each server being listed individually.
4. CA Directory(r12.x dxgrid) - It is possible that the etanotifydb file has reach its pre-determined size limit. This size limit can be adjusted, see technical document TEC481861 for details. For information on how to clear the notification queue, please review technical document TEC593173.
5. Workflow - If Workflow is enabled on the tasks that are stuck in-progress, verify if anything has been changed on the tasks before the tasks were approved. Such as deleting the task itself or deleting a tasks approver prior to the task being approved.
6. Custom BLTH - Verify logic in code does not interfere with Event Listeners' normal processing.
7. System Wide - Check for power hits, network problems coinciding with stuck tasks.
8. Cluster Configuration - Verify failover, JMS queuing, load balancing configurations.
Reasons for tasks becoming stuck in progress:
Identity Manager's Task Persistence functionality maintains the processing states of all tasks that are executed. Should a system failure occur, tasks will automatically continue processing following a restart of the application server.
The Java Messaging Service (JMS) will reprocess any msg delivered to a MDB (Message Driven Bean) that has not sent back an acknowledgement that the message has been delivered and processed. Having said that, the SubscriberMessageBean (SMB) that handles messages to process an event state may or may not have sent back that acknowledgement depending on when the IM server went down or experienced the problem.
Additionally, the processing of any message, representing a particular event state is also responsible for posting a new msg representing the next event state to process. If that does not occur you may end up with some events that are stuck in the current state when the problem occurred.
Certain system problems outside the control of Identity Manager can occasionally leave some tasks' states stuck in an in-progress state. Typically, this manifests itself as PX policy actions that are external to IM, like REST Query, SQL Query and Email actions. If the server that IM leverages for the external action is hung, you will see this as the IM task stuck in progress. The best way to diagnose this is to look at View Submitted Tasks for a version of the stuck task that completed at an earlier time and compare the list of executed PX policies to the one in the stuck state. You'll see that there are additional PX policies in the successful task. The first PX that doesn't appear in the stuck task is the first PX to investigate.
JMS problems can prevent the update of task data in the Task Persistence database leaving tasks without the latest status. Task Persistence database problems like deadlocks, connection failures, or general database corruption can cause task state inconsistencies. Custom BLTHs, Event Listeners or other custom code that inadvertently interferes with normal task processing could negatively affect task states. Incorrect application server cluster configuration (particularly surrounding JMS), Workflow problems and anything costumed in the system (power hits, network disconnects) can leave tasks stuck in-progress.