Long running CCB job stuck with thread pool workers

Document ID : KB000084419
Last Modified Date : 14/04/2018
Show Technical Document Details
Issue:
Error Message :
log4j:ERROR Failed to flush writer,
java.io.IOException: Bad file descriptor

Note: This article is meant for a CCB administrator only.

A CCB batch job gets stuck with the thread pool workers and is unable to proceed.  The following messages will be seen in the job report:

Actual:
 -  2016-12-15 22:40:09,064 [pool-1-thread-4] INFO  (support.cluster.MemberLeftThread) Removing member id: 46 from cache because it left cluster
 -  2016-12-15 22:40:09,066 [pool-1-thread-4] INFO  (support.cluster.ClusteredNode) Removing member 46 from the cluster cache
 -  2016-12-15 22:41:58,998 [pool-1-thread-1] INFO  (support.cluster.MemberLeftThread) Removing member id: 106 from cache because it left cluster
 -  2016-12-15 22:41:59,000 [pool-1-thread-1] INFO  (support.cluster.ClusteredNode) Removing member 106 from the cluster cache

Expected:
 -  2016-12-15 00:02:33,153 [Main Thread] INFO  (api.batch.BatchRunStatusHelper) Ending BRT values - batch nbr: 225, rerun nbr: 0, status: 40
 -  2016-12-15 00:02:33,155 [Main Thread] INFO  (api.batch.BatchRunStatusHelper) Batch Number: 225
 -  2016-12-15 00:02:34,167 [Main Thread] INFO  (api.batch.SubmitBatch) Run ended successfully with exit code 0
Return Code 0


Investigation:
Check to ensure the job is running through the thread pool worker by following these steps: 

  1. From the job report, you'll see something similar to this:
submitjob.sh -t 1 -g NNNN -l ENG -u SYSUSER -d 2016-12-15 -c 1 -b CM1MITD -p TPW_CCBPROD -x METER_SIZE_CHAR_TYPE="MTR SIZE",ITEM_TYPE="MTR1IN",EXCLUSION_CHAR_TYPE="CM-EXMTR",DISTRICT_CHAR_TYPE="DISTRICT",TO_DO_TYPE="CM1MITD"
 
Be sure to run the command from the command line in a putty session on the CCB batch server. This is critical since it will determine if it works correctly outside of Automic.
  1.  Run the job as the 'cissys' user.
echo $SPLENVIRON 
 
                If it is blank, then there is nothing set.  You will need to go back to the UI and copy the following value:
 
CCB > CONFIG > VARA.CCB.SETTINGS > SPLENVIRON_SCRIPT > copy the value to notepad.
 
Example:
.  /u01/ccbprod/middleware/spl/CCBPROD/bin/splenviron.sh -e CCBPROD -c /bin/true
 
This will source the SPLENVIRON line and set the environment.  You must have the environment set up before any CCB job will be able to run successfully.
 
Note: you may want to change the -d <date parameter>, if applicable.
 
  1. The CCB Admin can check the Thread pool workers as follows:
Command to check: 
 
. /u01/ccbprod/middleware/spl/CCBPROD/bin/splenviron.sh -e CCBPROD -c /bin/true
 
This is a standard Oracle command. It has nothing to do with Automic.
 
cd $SPLOUTPUT
ls –lrt
 
If it takes a while to return a list of files, then there are too many (possibly old) CCB files.  Our recommendation (for the CCB admin) would be to perform clean-up on a regular basis.
 
Note:  Running the job from batch is different than running the job from the front-end.  It is crucial to ensure that the job is able to run from batch.  If a CCB developer believes there are no issues with the job, ensure that they are running the job from batch and not the front-end.
 
Objective:
You want to check each thread pool worker to ensure that it picked up that job.
 
Commands:
grep CM1MITD threadpoolworker.TPW_CCBPROD.2016121515*
grep -n CM1MITD threadpoolworker.TPW_CCBPROD.2016121515*
 
The thread pool worker needs to acknowledge and pick up the job.
 
Conclusion:
If it does not work from batch, then the issue is external and has nothing to do with Automic.

Recommendation:
At the end of the batch day, Oracle recommends killing and restarting the thread pool workers.
 
From the CCB side:
  1.  Enable tracing in the command “-g NNNN”
    The CCB dev can set the command to “-g YYYY”, which will provide additional information since this command will enable trace for all parameters.
  2.  Validate the parameters ​
  3. Kill the thread pool workers

To verify:
ps -ef | grep submit

              This will check to ensure there are TPW processes running.
 
Oracle Resources for Best Practices:
Batch Best Practices for Oracle Utilities Application Framework based products (Doc Id: 836362.1)
Production Environment Configuration Guidelines (Doc Id: 1068958.1)

Solution:
        Within Automic:  Kill the thread pool workers and restart it.
        Outside of Automic:  Submit the job from the CCB front end.

Environment:
OS: Unix
Cause:
Cause type:
Other
Root Cause: This is not an Automic issue. It is an issue with Oracle Utilities Customer Care and Billing.
Resolution:
N/A

Fix Status: No Fix

Additional Information:
Workaround :
N/A