Autosys agent on Windows server crashes due to blocking from newly installed Cisco AMP anti-virus. Agent works fine after uninstalling the antivirus.
Mon Mar 19 08:18:08 2018: Warning: DB Lower disk threshold of 20983808 breached. Sending notice
Mon Mar 19 08:18:08 2018: setResourceWarningMode: invoking cybAgentDriver.setResourceWarningMode(boolean mode)
Mon Mar 19 08:18:09 2018: setResourceWarningMode: invoking cybAgentDriver.setResourceWarningMode(boolean mode)
Mon Mar 19 08:18:11 2018: Fatal: DB Lower disk threshold of 614400 breached. Shutting down
Mon Mar 19 08:18:11 2018: Initiating shutdown sequence...
Mon Mar 19 08:18:11 2018: Named pipe server is about to stop
Mon Mar 19 08:18:11 2018: NamedPipeServer exiting on CONTROL SHUTDOWN_INTERNAL
Mon Mar 19 08:18:11 2018: Named pipe server stopped
Mon Mar 19 08:18:11 2018: Transmitter is about to stop
On a hard crash of the agent, no message is sent to the scheduler (since it is a hard crash, there is no opportunity to send a message). When the agent restarts, either after a crash or a normal shutdown, it looks in the persisted queue files to see what jobs were last running. If they are internal jobs (such as file triggers), the agent restarts them. In the case of file triggers, it will send a message back to the scheduler saying the file trigger was restarted. If they are external jobs (such as Windows or Unix command jobs), the agent looks to see if the process is still running. If the process is still running, the agent continues to wait for them to finish as if nothing happened. If the process is not running (meaning the job ended while the agent was not running), the agent does not know if they ended with success or failure, so it sends a STATE FAILED with a status of "Lost control" back to the scheduler. If a persisted queue file is corrupted as part of the crash (this is very rare except in cases where there is no disk space), the agent could lose track of one or more jobs that were running and not be able to update the scheduler. The scheduler would think they were still in a Running state.
The log files may show that the machine ran out of free disk space on the drive where the agent is installed , which caused the agent to shut down to prevent corruption.
If the anti-virus software is responsible for the machine running low on disk space, it may or may not help if directories are excluded from the scan.
It is possible that the anti-virus software is locking directory structures and preventing the agent from correctly reading free disk space. If that is the case,
then excluding the agent directory from the scan may help.
If the machine is a Windows machine, it may be preventing reading disk information for the entire drive. Excluding specific directories will not help in that case.
If this is just a side-effect of the anti-virus and it is known that there will never be any actual disk free space issues, then disabling the disk resource monitor in the agent is a possibility.