This issue is due to the difference in time it takes for the OS and LLAWP shutdown processes to proceed.
In a scenario (for instance in a testing or on frequent and unstable environments) where reboots of the agent machine are unexpected and quite frequent, this problem may happen due to the mechanism used by the IIS agent to manage the connections between the IIS process, LLAWP and the Policy server:
- When the agent machine is restarted the system stops the services, including the IIS Server. This will result in a shutdown call to the Web Agent which will close all the connections.
- When all the w3wp processes are shutting down they will unregister from the LLAWP in the web agent code. When LLAWP detects that there are zero 'clients' connected to it, it will wait 20 seconds, then it will shut down, closing its connections to the the Policy Server.
- The reason why it waits for 20 seconds is by design: if the IIS shuts down, but then it is restarted again, and the Web Agent initializes, it will be much faster since there will be no need to create again a LLAWP process as there is one which is already initialized - resulting in a faster initialization of the Web Agent. This is useful in IIS since the w3wp process itself exits after a short while of inactivity (so this scenario happens routinely).
- However, if the Operating System shuts down before these 20 seconds, LLAWP never closes the connections to the Policy server. This leaves connections open in the Policy Server machine, which does not know that the other side is no longer there. In the Policy Server machine, if running a 'netstat' command, we will see the connections coming from the Web Agent IP and port appear in the 'ESTABLISHED' state.
Assuming the process of restarting the agent machine is happening very frequently, this mechanism will cause a lot of connections to be left as 'ESTABLISHED' in the Policy Server machine, even if there is no corresponding LLAWP process on the other side.
It may happen that during the frequent reboots, at one point the Web Agent tries to create a connection but it is assigned a port number that corresponds to one of these abandoned connections. So when it tries to connect, the Policy Server machine will see this as irregular, and it will not be sending a SYN/ACK packet in response to the SYN, which will cause a TCP sequence mismatch, resulting in the Web Agent machine discarding this packet as it is out of sequence and believing it is not meant to be a response to the SYN.
The Web Agent then waits for the SYN/ACK for 2 seconds and times out, resulting in failure to initialize.