When SMM is configured to monitor more than one web server instance, or both Policy Server and at least Web Agent are all on the same machine, it is observed that the SMM locking around this shared memory functions incorrectly. This is because the APR global mutex does not seem to work across independent processes.
The shared memory segment is critical for sharing metric data between Siteminder and Siteminder Manager (SMM). Siteminder Manager uses the Apache Portable Runtime (APR) library global mutex to serialize access to this shared memory segment.
In a typical SMM environment, the Webagent or Policy Server loads the Introscope API. The metrics are put into shared memory segment by making use of Introscope API calls which in turn uses APR (Apache Portable Runtime).
In case SMM is configured to monitor mutiple web server instances, or the Policy Server and Web Agent are on the same machine, the same mutex is shared across independent processes concurrently. This behavior is not desired. The third party library (APR) is inadequate to handle this use case.
This leads to shared memory corruption, the consequences are as follows:
1. After the corruption one might see the monitored processes (and any spawned threads) of the webagent/policyserver hung.
2. The following observations from the IntroscopeAPI.log, when SMM is set in debug mode:
a. A warning message -
WARN: Metric table overflow: <some large number> elements
b. Junk characters in the metric name –
Iscope:07/10/14(16:02:50):07865:7440064 DEBUG - GetMetricFromShm:(n,t): __¢ÿÄ, PerInterval Counter
3. Eventually a crash on Webagent/Policy Server.
The issue is observed across all variants of Apache webservers on Unix platforms.
The recommendation is to avoid monitoring both WebAgent and Policy Server on the same machine. Also, avoid monitoring multiple WebAgent instances running on the same machine, when possible.