Policy Server Hung if LDAP User Directory is unresponsive/slowly performing

Document ID : KB000005184
Last Modified Date : 14/02/2018
Show Technical Document Details
Issue:

Policy Server Hung if LDAP User Directory is unresponsive/slowly performing.

Environment:
Version : r12.5+Policy Server OS : ANYUser Directory : LDAP (ANY)
Cause:

If the Policy server doesn't have an existing connection to LDAP User Directory, it creates 3 new connection to the LDAP directory :

 

    1. PING Connection : The PING connection is used to check the health of the LDAP server periodically. One PING thread is created per each LDAP Failover group.

PING's thread ping connections send the following query every 30 seconds to test that the LDAP server is up and listening on the LDAP port

SRC base="<root object>" scope=0 filter="(objectclass=*)"

    2. Search/Directory Connection: The  "dir" connection is the LDAP connection used to search the directory instance (binds always as anonymous or as the credentials given in the User Directory Object)

    3. User Connection : “user" connection is the LDAP connection used to bind to the directory instance (binds first as anonymous or as the credentials given in the User Directory Object, then the connection is reused to bind

with the credentials of the authenticating user

 

When Policy server thread does the LDAP_BIND, it is always done under a LOCK, because LDAP handle needs to be protected. This is done so that Policy server won't crash when one worker thread is chaining the handle during the bind and the other thread tries to use it for LDAP search for example.

What this means is that, when one worker thread is doing an LDAP bind for say LDAP Server A, then no other worker thread can concurrently do the LDAP bind for ANY other LDAP Servers. So , if the first LDAP bind is delayed (due to LDAP server being unresponsive or slow performing) , then it will eventually cause remaining worker threads which are also waiting for LDAP bind to go into a waiting/hung state. However, if the other worker thread already have a valid LDAP connection, then they will not be impacted. This phenomena could sometime result Policy server to go into a hung state and become unresponsive.

 

To confirm if Policy server is affected with this condition, we need to obtain the process dump of the Policy server process (smpolicysrv.exe) and review where all the NORMAL priority thread are stuck.

For e.g. pstack capture would show a stack similar to following for this condition :

 

Thread 1 (Thread 0xe6e27b90 (LWP 17007)):

#0  0xffffe410 in __kernel_vsyscall ()

#1 0x00c3d6c3 in poll () from /lib/libc.so.6

#2  0xf5f23271 in pt_Continue () from /opt/CA/SiteMinder/PolicyServer/lib/libnspr4.so

#3  0xf5f2423d in pt_Connect () from /opt/CA/SiteMinder/PolicyServer/lib/libnspr4.so

#4  0xf5f0e869 in PR_Connect () from /opt/CA/SiteMinder/PolicyServer/lib/libnspr4.so

#5  0xf6026b58 in prldap_try_one_address () from /opt/CA/SiteMinder/PolicyServer/lib/libprldap60.so

#6  0xf6026e40 in prldap_connect () from /opt/CA/SiteMinder/PolicyServer/lib/libprldap60.so

#7  0xf6070e06 in nsldapi_connect_to_host () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#8  0xf607462c in nsldapi_new_connection () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#9  0xf60702e6 in nsldapi_open_ldap_defconn () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#10 0xf607507c in nsldapi_send_server_request () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#11 0xf607584f in nsldapi_send_initial_request () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#12 0xf6079474 in ldap_simple_bind () from /opt/CA/SiteMinder/PolicyServer/lib/libldap60.so

#13 0xc89b82f0 in CSmDsLdapFunctionImpl::LdapBind(ldap*, char const*, char const*, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#14 0xc89b5467 in CSmDsLdapFunctionImpl::BindServer(int&, CString&, CSmLDAPConn*&, CString const&, CString const&, CString const&, bool, bool, int, int, CSmLdapServers*, int&, bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#15 0xc89ade8c in CSmDsLdapFunctionImpl::GetConHandle(CSmDsProviderInstance*, char const*, int, bool, bool, bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#16 0xc89ba376 in CSmDsLdapFunctionImpl::SearchExts(CSmDsProviderInstance*, CSmLDAPConn*&, char const*, int, char const*, char**, int, ldapcontrol**, ldapcontrol**, timeval*, int, CArray<ldapmsg*, ldapmsg*>&, int, bool, bool, int (*)(char const*, CSmDsLdapError const&)) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#17 0xc898d952 in CSmDsLdapProvider::SearchExts(CSmDsProviderInstance*, CSmLDAPConn*&, char const*, int, char const*, char**, int, ldapcontrol**, ldapcontrol**, timeval*, int, CArray<ldapmsg*, ldapmsg*>&, int, bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#18 0xc8970759 in CSmDsLdapProvider::SearchImpl(CSmDsProviderInstance*, CStringArray&, CStringArray*, CArray<int, int>*, CArray<CSmDsAttrs, CSmDsAttrs&>*, CStringArray const*, CString const&, CString const&, CSmDsCursor*, Sm_PolicyResolution_t, int, int, int, bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#19 0xc896f05b in CSmDsLdapProvider::Search(CSmDsProviderInstance*, CStringArray&, CStringArray*, CArray<int, int>*, CArray<CSmDsAttrs, CSmDsAttrs&>*, CStringArray const*, CString const&, CString const&, CSmDsCursor*, Sm_PolicyResolution_t, int, int, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#20 0xf63d7fe7 in CSmDsDir::Search(CStringArray&, CStringArray*, CArray<int, int>*, CArray<CSmDsAttrs, CSmDsAttrs&>*, CStringArray const*, CString const&, CString const&, CSmDsCursor*, Sm_PolicyResolution_t, int, int, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmds.so

#21 0xf63d721b in CSmDsDir::GetUserDNlist(CString const&, CStringArray&, bool&, CStringArray const&, CArray<CSmDsAttrs, CSmDsAttrs&>&, bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmds.so

#22 0xf6510307 in CSmAuthUser::AuthenticateUserDir(CSmObjUserDirectory const&, CSmObjScheme const&, Sm_Api_Reason_t, Sm_AuthApi_UserCredentials_t&, CSmDsDir*&, CSmDsUser*&, Sm_AuthApi_Status_t&, bool&, bool&, CString&, CString&, CString&, CString&) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmauth.so

#23 0x080cf5bb in CSm_Auth_Message::AuthenticateUser() ()

#24 0x080671e2 in CSm_Auth_Message::ProcessAgentMessage() ()

#25 0x080c460e in CSm_Auth_Message::ProcessMessage() ()

#26 0x081470e1 in CSmPolicyServer::vOnRequest(CClientSession const*, CString const&, unsigned int, CSmAgentTliPacket&, CSmAgentTliPacket&, int) ()

#27 0xf7ded01c in CServer::ProcessRequest(CClientSession*, CString const&, unsigned int, CSmAgentTliPacket&, CSmAgentTliPacket&, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#28 0xf7dc739c in CAgentMessageHandler::DoWork(unsigned char*, unsigned char*, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#29 0xf7dbe3ee in ThreadPool::Run(bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#30 0xf7e6f096 in ThreadPoolBase::ThreadProc(void*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#31 0xf7cc8f29 in BtThreadBase(ThreadArgs*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmcommonutil.so

#32 0x00cd5912 in start_thread () from /lib/libpthread.so.0

#33 0x00c474ae in clone () from /lib/libc.so.6

 

Thread 2 (Thread 0xdb014b90 (LWP 17153)):

#0  0xffffe410 in __kernel_vsyscall ()

#1  0x00cdc839 in __lll_lock_wait () from /lib/libpthread.so.0

#2  0x00cd7e9f in _L_lock_885 () from /lib/libpthread.so.0

#3  0x00cd7d66 in pthread_mutex_lock () from /lib/libpthread.so.0

#4  0xf7cc69f3 in EnterCriticalSection(_critsection*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmcommonutil.so

#5  0xc8969b27 in CSmDsLdapProvider::InitDir(CSmDsProviderInstance*&, Sm_Api_AppSpecificContext_t const*, CString const&, CString const&, CString const&, CString const&, CString const&, CString const&, bool, bool, int, int, int, CString const&) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmdsldap.so

#6  0xf63d0abf in CSmDsDir::CSmDsDir(CSmObjUserDirectory const&, Sm_Api_AppSpecificContext_t const*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmds.so

#7  0xf485b9ad in CSmAzMapping::GetAzUser(Sm_Api_UserContext_t*, CSmAuthSession const&, CSmObjRealm const&, int&) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmazuser.so

#8  0x0808e2cf in CSm_Az_Message::InitAuthUser(CSmAuthSession const&, CSmObjRealm const&, int&, bool*) ()

#9  0x08088791 in CSm_Az_Message::IsAuthorized() ()

#10 0x080b3508 in CSm_Az_Message::ProcessMessage() ()

#11 0x081470e1 in CSmPolicyServer::vOnRequest(CClientSession const*, CString const&, unsigned int, CSmAgentTliPacket&, CSmAgentTliPacket&, int) ()

#12 0xf7ded01c in CServer::ProcessRequest(CClientSession*, CString const&, unsigned int, CSmAgentTliPacket&, CSmAgentTliPacket&, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#13 0xf7dc739c in CAgentMessageHandler::DoWork(unsigned char*, unsigned char*, int) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#14 0xf7dbe3ee in ThreadPool::Run(bool) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#15 0xf7e6f096 in ThreadPoolBase::ThreadProc(void*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmutilities.so

#16 0xf7cc8f29 in BtThreadBase(ThreadArgs*) () from /opt/CA/SiteMinder/PolicyServer/lib/libsmcommonutil.so

#17 0x00cd5912 in start_thread () from /lib/libpthread.so.0

#18 0x00c474ae in clone () from /lib/libc.so.6

 

As you can see above, Thread 1 is doing a LDAP bind operation and is currently waiting for the LDAP response and due to this , Thread 2 which is also is trying to intialize the directory (initialization of the directory involves LDAP bind to that directory) is currently stuck and waiting for Thread 1 to complete.

Resolution:

The best solution for this is to check for the root cause of LDAP slow perfermonace and fix it.

However, from policy server side , you can configure various LDAP time out settings such that Policy server does not wait indefinitely for LDAP response.

http://www.ca.com/us/support/ca-support-online/product-content/knowledgebase-articles/tec1466133.aspx

Additional Information:

https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.tec1466133.html