Session manager failed

Document ID : KB000125548
Last Modified Date : 30/01/2019
Show Technical Document Details
Issue:
In a 3.2.2 or 3.2.3 multisite cluster, the session manager in the primary site nodes often gets out of sync, as does the session and credential manager of the nodes in the secondary sites.

Sometimes restarting the cluster or resynching the secondary node in its site is able to recover the cluster to a synchronized situation, but there are situations when the credential managers of the nodes in the secondary sites go out of sync and will not recover back to a synchronized situation no matter what action is carried out: be it site synchronization or cluster restart.

If the logs of the master node in the primary site are verified, there are countless errors like the following

Jan 24, 2019 10:50:07 AM com.cloakware.cspm.server.app.SiteReplicationServlet a 
SEVERE: Unauthorized request from site at host:63.90.3.170 [63.90.3.170


And on any of the secondary site nodes where the problem is occurring, the following lines are displayed

Jan 24, 2019 10:50:07 AM com.cloakware.cspm.server.replication.ReplicationPoller poll 
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated 
Environment:
CA PAM 3.2.0, 3.2.2 or 3.2.3
Cause:
This is an issue which may come from possible network conditions or some other condition that makes Primary Master believe that the sec site is far behind in replication, thus marking it as inactive
Resolution:
There have been several fixes included in version 3.2.4 which deal with this and other cluster synchronization issues, among other the one where the session manager database goes out of sync.

As a possible workaround it is advisable to increase the Max number of queued replication records before member deactivation setting in the cluster. To do so:
  1. Login to Primary master
  2. Turn off the cluster
  3. Enable cluster tuning under Configuration->Diagnostics->System
  4. Go to cluster config - "Tuning" tab and change the "Max number of queued replication records before member deactivation" from the default 10000 to 20000
  5. Save config locally , save config to the cluster
  6. Turn on the cluster