Baseline_engine.QOS_MESSAGE queue is piling up and is yellow in the hub queue.

Document ID : KB000004425
Last Modified Date : 04/10/2018
Show Technical Document Details
Issue:
  • The baseline_engine.QOS_MESSAGE queue has a large number of unprocessed items and appears to have stopped processing.
  • Memory for the baseline_engine probe will show to be at 100% of Xmx.
  • Processor usage may show to be equal to 1 core's worth of CPU availability.
  • Restarting the probe may allow it to process messages for a few minutes before coming to a halt again.
Environment:
baseline_engine 2.70 or greater
Cause:

This probe uses the log4j system for probe logging. There is a known issue with this SDK where it was discovered that if the thread that handles logging is unable to keep up with the amount of information being logged, it blocks other threads that do the real work of the probe.

Resolution:
  1. Navigate to the nimsoft\probes\SLM\baseline_engine directory
  2. Edit the log4j2.xml file
  3. Modify the AppenderRef lines to have ref="console" as shown below

    <Loggers>

        <Root level="info" additivity="false">

            <AppenderRef ref="console"/>

        </Root>

        <Logger name="PERFORMANCE" level="off" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="MESSAGELIMIT" level="off" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="com.ca.analytics.dmc.receive" level="info" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="org.quartz" level="warn" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="always" level="trace" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="org.springframework" level="warn" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="com.nimsoft.threshold.cmd" level="error" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="com.nimsoft.derivedmetrics.performancestats" level="warn" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

        <Logger name="com.nimsoft.derivedmetrics.scriptlink" level="warn" additivity="false">

            <AppenderRef ref="console"/>

        </Logger>

    </Loggers>

 

-------------------------------------------------------------------------------------------------------

Attached is a UIM probe package to update this file. Download and import into your archive, then deploy to the baseline servers. This probe package is not supported and given free to use to speed up deploying. Please test in your test system before deploying to production. It will also set baseline_engine loglevel = 0. 

 

 log4j2_1.2.zip

Additional Information:

This problem can affect other java based probes that use the log4j SDK for logging (such as the discovery_server probe and prediction_engine).

If the steps above do not work, please try the following:
1- Delete the queue for baseline_engine.QOS_MESSAGE from the hub
2- Restart the hub
3- Go into the Admin Console and enable 'publish baseline' on any metric under that same hub.
4- A 'BASELINE_CONFIG' message should be sent and then at that point, baseline_engine will recreate the QOS_MESSAGE queue and start reading it again.

File Attachments:
TEC1000777.zip