Error Message :
The following issue is observed where jobs in SAP have already ended, while in the Automation Engine (AE) the associated jobs do not end.
In most cases, direct intervention is necessary to keep the workflow running.
If not intervened, the jobs usually end hours later and thus delay further processes.
This behavior was observed in versoin 11.2.0 and an update to version 11.2.4 was implemented as a solution.
However, the issue has re-appeared in AE version 11.2.6 using the SAP Agent version 11.2.5.
This Error occurs in combination with high load periods, thus a connection to the workload is very likely). Jobs using R3_ACTIVATE_REPORT with Child Jobs are affected.
It looks like the Child Jobs are the reason why the delay occurs. As this has happened before, (v11.2.0) and we advised to update to v11.2.4. where the issue was resolved. The Automation Engine now has version 11.2.6 and the SAP Agent version 11.2.5. unfortunately the issue still occurs.
How the Agent works:
- The child jobs transfer millions of job log lines.
- The Agent spends most of its time waiting for an answer for /SBB/UC4_JOB_READ_LOG calls.
One such call takes about 10-20 sec.
- The Agent starts a new thread for each new job but not for child jobs.
Child jobs are running in the same thread as their parents, so the joblogs of the children are fetched one after the other in one thread.
Root Cause: We can confirm that the root cause of the delay is huge job logs.
The reason for the delayed completion is the amount of job log lines being transferred.
The agent uses only one thread per job which is by design, this includes all child jobs.
When a very high number of job log lines is transmitted, there is a delay.
The following options are available for rectification or mitigation:
- Deactivate the reports or write only in case of error. This would completely remove the cause of the delay. This would not be an option if you need the reports for post processing or audits.
- Using the standard BAPI instead of the Automic Interface - this can be changed in the Connection object.
-> These are the preferred points, for the sake of completeness, you may also consider these:
- Using process chains, this would return the log from RSPC_API_CHAIN_GET_LOG, which would probably be much faster because less information is being transferred from RSPC_API_CHAIN_GET_LOG
- Use additional agents - this does not cause any licensing costs if they run on the same host, but the single JOB would not be finished in time, as the child jobs cause the delay.
- If jobs can be redesigned, 5 jobs with 10 childs each would be much faster than a job with 50 child jobs
Fix Status: No Fix