SysEDGE suddenly stopped working on multiple servers

Document ID : KB000100312
Last Modified Date : 06/06/2018
Show Technical Document Details
Issue:
  • SysEDGE on multiple servers stopped working suddenly, and are not responding to SNMP requests.
  • Restarting SystemEDGE the agent hangs and is not initializing properly. 
Environment:
  • RedHat 6
  • RedHat 7
Cause:
  • SystemEDGE agents run independently.
  • If multiple SystemEDGE agents are impacted simultaneously the problem is cause by an environmental issue.
  • After restarting the SystemEDGE agent the last entry in the sysedge.log is:
add_monitor_entry(): Self monitor index 1000005 configured with variable OID 1.3.6.1.4.1.546.12.1.1.4.1 which is not yet available.
  • OID 1.3.6.1.4.1.546.12.1.1.4.1 
diskStatsUtilization -> diskStatsEntry.4 (1.3.6.1.4.1.546.12.1.1.4) 
Type: INTEGER Access: read-only 
The utilization rate (percentage utilization) for this disk over the last measurement period. This could also be expressed as (disk busy time / elapsed time) * 100. 
  • Disabling SystemEDGE Disk Based polling within the sysedge.cf file allows SystemEDGE to fully initialize:
no_probe_disks 
no_discover_disks      
no_stat_nfs_filesystems
 
  • Running df -h command also hangs.
Resolution:
  • A non responsive File System\Disk Array Issue is causing SytemEDGE to become unresponsive.
  • SystemEDGE is a single threaded application and something like a hung\stale NFS mount can cause a "blocking call".
Additional Information:
How to verify SysEDGE is working correctly
https://comm.support.ca.com/kb/how-to-verify-sysedge-is-working-correctly/kb000032112