When running a Full Page Monitor check from the On Premise page of ASM, the following errors appears "Checkpoint Error: 503 Service Unavailable. Sorry, rbm agent is too busy at the moment (too many checks pending) (-95)". This issue occurs even when the results of “cat /proc/loadavg” do not indicate excessive CPU utilization.
On Premise Monitoring Station (OPMS) version 10.x
The OPMS Full Page monitor determines if an instance is already running by obtaining the PID from agent-rbm.pid and determines if a process with the same PID exists. It does not validate which process is running using the PID.
To correct this issue...
-Run "monit -g smartpop stop"
-Check status every few seconds until no processes have “- stop pending” as part of their status with "monit summary"
-Delete file agent-rbm.pid "rm -f /opt/asm/var/run/agent-rbm.pid"
-Start the process again "monit -g smartpop start"
-Check status every few seconds until all processes have a status of "Accessible", "Running" or "Status ok" with "monit summary"
-Validate PID in agent-rbm.pid is correct with "ps --no-headers -o cmd -p $(cat /opt/asm/var/run/agent-rbm.pid)" The return value should be “/usr/bin/python rbm.py”