The application comprises multiple funnels connecting to a daemon (aefad). There are also proxy applications which continually poll the server. When the system is taken down for maintenance it sometimes does not restart correctly because one of the funnel processes will fail. The funnel log file shows the following error:
02:44:46.811756==>aefuf: no available APPL pgm structure
The problem is caused by a combination of factors.
- The start up script for the application did not leave any time between starting each funnel process.
- The continual polling by the proxy application resulting in runtime errors during start up.
Correcting either one of the above would address the problem. Tests performed in a situation where no polling application existed always resulted in a clean start up even with the no delay between the start of each component.
The error condition caused by the polling application was as follows:
03:11:38.142846==>aefad: USER: user05, Sock = 5, sock05, connected
03:11:38.144770==>rcvfhdr: FUNN: Sock = 5, prd03, started
03:11:38.150811==>aefad: USER: user06, Sock = 6, sock06, connected
03:11:38.151603==>rcvfhdr: FUNN: Sock = 6, prd03, started
03:11:38.157897==>aefad: USER: user07, Sock = 7, sock07, connected
03:11:38.159649==>rcvfhdr: FUNN: Sock = 7, prd03, started
03:11:38.176549==>aefad: USER: user08, Sock = 8, sock08, connected
03:11:38.178995==>rcvfhdr: FUNN: Sock = 8, prd03, started
03:11:38.189447==>rcvfhdr: USER: user1280, Sock = 1280, ossprd03, lterm001, started
03:11:38.192043==>GetGUIpgm: USER: comsv001, Sock = 1280, prd03, commsrvr, started
03:11:38.196587==>aefad: APPL: SLASRV, TRAN: TSLAMON, Sock = 9, connected
03:11:38.198191==>rcvpgm: APPL: SLASRV, TRAN: TSLAMON, Sock = 9, FHDR header data
?0010AAB4? 12 *.*
03:11:38.198279==>abortpgm: RCVERR: APPL: SLASRV, TRAN: TSLAMON, Sock = 9
03:11:38.198671==>xerrGUIpgm: ABORTPGM: $#XERR#$
?000FAABC? 00000048 00000000 00000000 00242358 45525223
?000FAAD0? 247c0e00 00000000 00000000 00000000 00000000
?000FAAE4? 00000000 00000000 00000000 00000000 00000000
?000FAAF8? 00373333 00000000 00000000 *.733........*
03:11:38.199511==>delpgm : APPL: SLASRV, TRAN: TSLAMON, Sock = 9 close, pid = 26194
03:11:38.200184==>aefad: USER: user10, Sock = 10, sock10, connected
03:11:38.220964==>rcvpgm: USER: user10, Sock = 10, sock10, remote close
03:11:38.221082==>abortpgm: RCVERR: USER: user10, Sock = 10, sock10
03:11:38.221551==>sigexit: got signal = 18
03:11:38.221979==>abendpgm: APPL: Sock=9, pid=26194, APPL END
03:11:38.222242==>myclose: USER: Sock = 10, localhost, user10, USER END
03:11:38.223492==>aefad: USER: user09, Sock = 9, sock09, connected
03:11:38.223678==>rcvpgm: USER: user09, Sock = 9, sock09, remote close
03:11:38.223737==>abortpgm: RCVERR: USER: user09, Sock = 9, sock09
03:11:38.223944==>myclose: USER: Sock = 9, localhost, user09, USER END
Note the time of the corresponding funnel error and the relative times:
03:11:38.193163==>Starting AEFUF, Advantage(tm) Gen release version: 66
03:11:38.223169==>aefuf: no available APPL pgm structure
03:11:38.223247==>Slots used: 0
Introducing a sleep between starting the funnel processes removed the problem e.g.
aefuf -i 10032 -c 10030
aefuf -i 10033 -c 10030
aefuf -i 10034 -c 10030
aefuf -i 10042 -c 10031
The script already had a delay between the start of the daemon and the first funnel. The additional delays allow the connection between the daemon and each funnel to be correctly initialised before the next connection is attempted.