Customers may need to monitor their databases/database instances when they become 'hung.'
There is no specific "out of the box" checkpoint(s) to monitor what might be considered a 'hung' database/database instance.
The term "hung state" when referring to a database instance is unfortunately too vague a statement to provide focused-specific help on. This could mean many different things, all with different possible solutions. For example,
- a single instance or application is not responding
- you can not log in remotely with RDP but all other functions are working
- All web services are down but other local services are working
- an instance appears to be available, but no transactions can be run
You first need to define exactly what constitutes a 'hung state' and then see what can be done to monitor for that condition, and alarm on it.
If it is an application 'not responding,' you could possibly monitor it with the e2e_appmon probe and create an e2e transaction to test access to the instance to run a basic statement/transaction that indicates the instance is hung but you would have to identify a means of doing so that you would trust as an indicator/proof.
If the application should be adding data to a database you could possibly create an SQL checkpoint to check for records added and alarm if there are not new ones.
If the application writes to a log file setup logmon to parse for an error/errors, e.g., transaction logs, or a remote dirscan probe to monitor the log files for activity.
Overall, I would recommend considering the use of:
- net_connect probe to monitor instance availability / service availability@port
- logmon to parse the transaction logs for specific errors
- logmon OR nexec to run a given command, e.g., telnet <database_server> port to test connectivity or run another useful command to test the access/run a stored procedure/transaction
- sqlserver/oracle custom checkpoint to run a statement or a stored procedure suggested by your DBA that can help determine the true state of the instance
- processes probe to test that the 'agent' is up and running
- e2e to run a synthetic scripted transaction against the database every 15 minutes to test if the user can log on and run a given low-impact transaction
- cdm probe to monitor CPU/Memory/Disk I/O on the server IF one or more of those symptoms are indicators for the hung state
For delivering a solution to your customer/client, we recommend working with their DBA to help identify the method that best fits the situation/problem.