A "Failed to initialize the message bus" error means that the web agent is NOT able to talk to the policy server. If the agent were connecting to the policy server, it should say "LLAWP: Message bus initialized" in the Web Agent log.
The SmHost.conf contains 2 Policy Servers, it turns out that the first Policy Server on the list is down.
The AgentWaitTime parameter which can be set in the WebAgent.conf file for the affected Agents. This parameter was introduced to help offset any network delays in the environment causing a delay in the Web Agent receiving it's configuration information from the Policy Server. This parameter defines the amount of time in seconds that the Web Agent will wait to attempt to attach to the Message Bus.
Please implement the 'AgentWaitTime' parameter in the WebAgent.conf files for this Web Agent. Default: 5
Example: If you have primary and secondary policy servers, use a value between 60 and 80. The value of AgentWaitTime depends on the number of policy servers in HCO.
As a rule of thumb, the AgentWaitTime will be the number of Policy Servers times 30 seconds + some extra 10 seconds, as a latency time, just to be sure.
This 10 seconds time is added and may be necessary in case of:
- Smhost.conf lists multiple policy servers.
- The first is down, the second is up, but the HCO lists the first on top of list so more time is wasted trying to connect to it.
Process in more detail:
When you start the Web Server, it loads Siteminder Web Agent filter (WA).
Then WA spawns a new process for the Worker Process (LLAWP), and continues to try to attach to a message bus that will be created by the LLAWP. WA will keep trying to attach to this message bus for a default of about 5 seconds, but this wait time can be configured thru 'agentwaittime' key in Webagent.conf.
Now, LLAWP tries to attach to the first policy server (PS1) specified in smhost.conf (using the agentapi).
The message bus the WA is waiting for still doesn't exist. If PS1 is unreachable, LLAWP will wait exactly 30 seconds before it times out and moves to the next policy server in the smhost.conf (PS2). Once successfully connected to PS2, it will retrieve the HCO information and use this info from now on. If in the HCO there are also PS1 and PS2 in that order, the LLAWP will now try to connect to PS1 again, waiting 30 more seconds before it times out (we're at 60 seconds now). Then it will move to PS2 and continue its flow and will create that long awaited message bus.
So in this configuration it would be right to define an AgentWaitTime of 60+some extra time for safety - let's say 10 or even 20 seconds.
Then the WA will wait for 70-80 seconds for the message bus and will work properly.