Procesess that use GEL are getting stuck in PPM Clarity

Document ID : KB000106346
Last Modified Date : 18/01/2019
Show Technical Document Details
Issue:
Some processes that use GEL are not completing in PPM/Clarity. If you create a process with just a start and finish step, and nothing else, the process completes successfully, but those using GEL are not completing.

 
Environment:
Any PPM environment that uses processes with gel scripts
Cause:
This can happen if the util:sleep command is in the gel scripts.  

You should avoid using util:sleep in GEL scripts. This tag puts the thread executing the GEL script to sleep. There are 15 GEL threads allocated per Process Engine instance. If all 15 are sleeping, other GEL steps will be unable to execute and processes will appear to hang.  Having most or even some of your gel threads sleeping can cause slow performance on your process engines.
Resolution:
Short Term Solution:
  • Restart the bg services. Usually one restart suffices. But sometimes multiple restarts will be required.
The problem with restarting the bg process one or more times to resolve this problem is that the problem is likely to reoccur.  The more gel script processes you run using this tab, the more likely this problem is to reoccur.

Long Term Solution:
  • Rewrite process gel scripts so that they do not use sleep:util.
Instead of using util:sleep a better way is to move the check to a post condition.  Assuming your GEL script is trying to monitor the step completion in some other process instance you could have that process set a flag/value on an object that can trigger an event that the post condition in your monitoring process can detect before it moves on. 

That may mean breaking your existing process and GEL script into multiple pieces rather than one bigger process. 

That may also mean your process logic needs to be rethought to use a custom object or custom attribute on some other object that can used in your process in a post condition. 

Additional Notes:
  • Because the post condition pipeline can handle and iterate through many post conditions without clogging up its bandwidth, fixing your processes so that they use post conditions instead of util:sleep will allow your to sure that this problem won't occur again.. 
  • You could have 300+ process instances waiting on a post condition without affecting process engine performance, whereas even a few GEL scripts stuck sleeping or polling can have an adverse affect on the system throughput and behavior. 

STEPS TO TROUBLESHOOT:

1.  Run a process with a Start and Finish step and nothing else.  There should be no actions or gel scripts in these steps.  If the process does not complete, this is not the problem.

2.  Find out if restarting the bg (even if it takes more than one try) fixes the problem for a while.  If so, this is an indicator that we might be on the right track.

3.  Take a Java thread dump of the bg service while the problem is occurring.  Make sure you choose the actual bg service and not the wrapper when selecting the java process.   See the attached document for instructions on how to create a thread dump.  Save the output to a text file.

4.  Open the text file and search for the word "custom".  You will see 15 custom threads listed.  If you see the word sleeping under all or most of the custom threads, the process engine lockup is caused by use of the util:sleep tag.
File Attachments:
HOW TO TAKE A JAVA THREAD DUMP.docx