Primary SpectroSERVER hanging with high cpu

Document ID : KB000008197
Last Modified Date : 14/02/2018
Show Technical Document Details
Issue:

The Primary SpectroSERVER is hanging and showing high cpu. Tomcat disconnects and will not fail over to the Secondary SpectroSERVER

Cause:

If the Primary SpectroSERVER is busy processing trap related events, it could cause high cpu to the point where the SpectroSERVER stops responding to client requests which could then result in OneClick disconnecting and preventing OneClick from failing over to the Secondary SpectroSERVER.

Resolution:

Check the Archive Manager DDM database for models that are logging a high number of events by doing the following:

1. Log into the SpectroSERVER system as the user that owns the Spectrum installation

2. If on Windows, start a bash shell by running "bash -login"

3. cd to the $SPECROOT/mysql/bin directory and enter the following command to print out the top 50 models logging the top 50 events where the start and end dates correspond to a 24 hour period in your environment where you are seeing the issue occur:

./mysql --defaults-file=../my-spectrum.cnf -uroot -proot ddmdb -e "select hex( type ), hex( e.model_h ), m.model_name, count( * ) as cnt from event e, model m where e.model_h = m.model_h and utime > UNIX_TIMESTAMP('2017-09-07 00:00:00') and utime < UNIX_TIMESTAMP('2017-09-08 00:00:00') group by type, e.model_h order by cnt desc limit 50"

The output will look similar to the following where the first column is the event id, the second column is the model handle, the third column is the model name and the fourth column is the number of events logged in the time period specified:

+-------------+------------------+--------------------------------------------------------+------+

| hex( type ) | hex( e.model_h ) | model_name                                             | cnt  |

+-------------+------------------+--------------------------------------------------------+------+

| 10F91       | 200000E          | SSPerformance                                          | 1440 |

| 4820002     | 200000E          | SSPerformance                                          | 1440 |

| 1022F       | 200000E          | SSPerformance                                          | 1440 |

| 1001D       | 2000278          | Sim30123:nslabcn501a.geico.net                         |   74 |

| 1001D       | 200006E          | Sim30123:nslabcn501a.geico.net                         |   29 |

| 10219       | 200006B          | Andy                                                   |   18 |

| 1021A       | 200006B          | Andy                                                   |   18 |

| 1001D       | 2000660          | Sim30017:vanor-nor-idf-1-01.mgmt.internal.das          |   12 |

 

4. You can then launch the Event Configuration editor, filter for the event id and display the "Trap Event" column in the Navigation panel to see if the top event id's are trap events:

778023_1.png

 If the high number of events are caused by traps from a few devices, determine why these devices are sending this volume of traps and address the issue at the device(s).