How to monitor a hung database instance in MS SQL Server / Oracle

Document ID : KB000130195
Last Modified Date : 28/03/2019
Show Technical Document Details
Introduction:
Customers may need to monitor their databases/database instances when they become 'hung.'
Environment:
- UIM v8.5x or higher
Instructions:
There is no specific "out of the box" checkpoint(s) to monitor what might be considered a 'hung' database/database instance. 

The term "hung state" when referring to a database instance is unfortunately too vague a statement to provide focused-specific help on. This could mean many different things, all with different possible solutions. For example,

- a single instance or application is not responding 
- you can not log in remotely with RDP but all other functions are working
- All web services are down but other local services are working
- an instance appears to be available, but no transactions can be run

You first need to define exactly what constitutes a 'hung state' and then see what can be done to monitor for that condition, and alarm on it.

Monitoring options:

If it is an application 'not responding,' you could possibly monitor it with the e2e_appmon probe and create an e2e transaction to test access to the instance to run a basic statement/transaction that indicates the instance is hung but you would have to identify a means of doing so that you would trust as an indicator/proof.

If the application should be adding data to a database you could possibly create an SQL checkpoint to check for records added and alarm if there are not new ones. 

If the application writes to a log file setup logmon to parse for an error/errors, e.g., transaction logs, or a remote dirscan probe to monitor the log files for activity.

Overall, I would recommend considering the use of: 

- net_connect probe to monitor instance availability / service availability@port 
- logmon to parse the transaction logs for specific errors 
- logmon OR nexec to run a given command, e.g., telnet <database_server> port to test connectivity or run another useful command to test the access/run a stored procedure/transaction 
- sqlserver/oracle custom checkpoint to run a statement or a stored procedure suggested by your DBA that can help determine the true state of the instance 
- processes probe to test that the 'agent' is up and running 
- e2e to run a synthetic scripted transaction against the database every 15 minutes to test if the user can log on and run a given low-impact transaction 
- cdm probe to monitor CPU/Memory/Disk I/O on the server IF one or more of those symptoms are indicators for the hung state 

For delivering a solution to your customer/client, we recommend working with their DBA to help identify the method that best fits the situation/problem.
Additional Information:
Here are some helpful links:

Oracle:
http://www.dba-oracle.com/t_troubleshooting_hung_database.htm

https://docs.oracle.com/cd/E17781_01/server.112/e18804/monitoring.htm#ADMQS247

MS SQL Server:
https://troubleshootingsql.com/2012/07/04/sql-server-is-hung/

http://giladka8.blogspot.com/2012/08/sql-server-query-is-stuck.html

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/troubleshoot-connecting-to-the-sql-server-database-engine?view=sql-server-2017