Monitor up/down status using both ICMP (port 0) pings as well as handshaking specific service ports, e.g., NimBUS port 48000 for robot and 48002 for hub.
cdm or rsp probe:
Monitor CPU, Disk, Memory, I/O
CPU / Memory for select processes e.g., hub.exe, nas.exe etc
Monitor for max restarts entries in core probe logs such as:
hub, controller, UMP probes (dap, dashboard_engine and wasp)
You can use logmon and parse the probe log(s) for "Max restarts reached for probe" or any hub, nas, and data_engine errors in logs
Use the dirscan probe locally on each hub to monitor the q files (size) and alarm when it is greater than <size_of_file>
Optionally, you could deploy a remote nas and emailgtw on one of your remote hubs to send an EMAIL when a queue alarm is generated.
Make sure that under the setup/hub section, set hub and controller loglevel to at least 3 and logsize of at least 8000 so we have more details just in case this happens again.
- use processes and monitor java.exe using the associated command line for discovery_server
- use logmon to monitor the log for "exception"
Monitor for data_engine errors/exceptions and alarm on them
Use appropriate probe depending on what type of database is being used, e.g., sqlserver, oracle, mysql
Monitor size of database files
- use processes to monitor the emailgtw.exe process
- use logmon to look for each of these errors in the log:
"error on session"
"failed to start"
"FAILED to connect session"
Monitor key interfaces for discards/errors, e.g., hub/tunnel machines
ntservices / ntevl:
- used to monitor services or events, e.g., application
Application crashes via ntevl
dirscan can be used to monitor for presence of core files
instrument the JMX on wasp by adding these startup arguments to the Extra Java VM arguments:
-Dcom.sun.management.jmxremote.port=27000 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
Then, QoS can be gathered via the jvm_monitor or a third-party app such as VisualVM:
- use processes and monitor java.exe using the associated command line for spectrumgtw
- use logmon to monitor the spectrumgtw.log for "exception"