Postgres process consuming 100% CPU

Document ID : KB000048198
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

The Postgres process has started consuming 100% of CPU resources

Restarting the MOM and Postgres db does not help.

TOP command shows:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18498 saadmin 25 0 266m 86m 11m R 100.0 2.2 6:36.78 postgres
Postgres log file for the PID 18498 contains the message:

select agent0_.id as id0_, agent0_.name as name0_, agent0_.process_name as process3_0_, agent0_.host_name as host4_0_, agent0_.fully_qualified_host_name as fully5_0_, agent0_.creation_date as creation6_0_, agent0_.update_date as update7_0_, agent0_.user_name as user8_0_ from apm_agent agent0_ where agent0_.id not in (select distinct vertex1_.agent_id from apm_vertex vertex1_

where vertex1_.id in (select distinct edge2_.head_vertex_id from apm_edge edge2_ where (edge2_.id in (select edge3_.id from apm_edge edge3_ where update_date<$1 or update_date>$2)) and (vertex1_.agent_id is not null)) or vertex1_.id in (select distinct edge4_.tail_vertex_id from apm_edge edge4_where (edge4_.id in (select edge5_.id from apm_edge edge5_ where update_date<$3 or update_date>$4)) and (vertex1_.agent_id is not null)))

Solution:

This is a result of an APM-Catalyst integration which uses Application Triage Map data (i.e. data in the APM Transactional Model) and large amounts of edges (from the apm_edge database table).

The query looks like one of the obsolete queries run once per hour (by default) in the background to identify obsolete data in the data model which is called by the Catalyst Connector.

Because this query runs every hour without success and the default timeout is 3 hours, CPU usage continues for a long time, causing 100% usage.

To prevent this,

set the statement_timeout property in the file postgres.conf to 5 minutes (5*60*1000=300000)

restart the APM cluster and Postgres Database.