Damn, bloody Interbase, it died again and they call me at night to restart it. What were they thinking when they decided to use it? Enough grief - we're moving to MySQL, but meanwhile I have to survive somehow... So I decided to create a little script to detect problem and restart automatically. It will be running every minute. There are 'ibguard' process restarting 'ibserver' automatically if it dies so I just need to terminate troubled application when it no longer works.
So the script was:
* * * * * [ echo >/dev/tcp/rrdb01/3050 ] || ( date>>/opt/interbase/_sighup.log && bash -c 'pgrep -n -f "/opt/interbase/bin/ibserver -P gds_db" | xargs kill -s SIGHUP' )
But it didn't work because Interbase stop responding while it still accept new TCP connections. OK, another try:
* * * * * echo exit\; | /opt/interbase/bin/isql -user sysdba -password guessguess /opt/interbase/saint.gdb >/dev/null || ( date>>/opt/interbase/_sighup.log && bash -c 'pgrep -n -f "/opt/interbase/bin/ibserver -P gds_db" | xargs kill -s SIGHUP' )
this didn't work too because sometimes ibserver process do not respond to TERM/SIGHUP signals. In this case 'ibguard' trying to start 'ibserver' again and again eventually spawning dozens of them. Crap, now I need to kill many of them, not just one and without mercy:
* * * * * echo exit\; | /opt/interbase/bin/isql -user sysdba -password guessguess /opt/interbase/saint.gdb >/dev/null || ( date>>/opt/interbase/_sighup.log; killall -9 ibserver )
And when I wrote this, one thought struck me - what if 'isql' return no error code when it cannot connect? Am I spent too much time scripting in bash to expect every single command to behave? Well 20 years (at least) of development traditions mean nothing to Interbase developers - bloody command returns no error code! Fortunately when something wrong it print error to STDERR so with a little fun with redirections we can catch the problem if it is reported while ignoring normal output:
* * * * * [ $(echo exit\; | /opt/interbase/bin/isql -user sysdba -password guessguess /opt/interbase/saint.gdb 2>&1 >/dev/null | wc -l) -eq 0 ] || ( date>>/opt/interbase/_sighup.log; killall -9 ibserver )
This worked perfectly a dozen times already. The only minor problem I ignore for a moment is that for every restart there are 4 lines logged. But I'm not going to spoil you research by telling you why...
(All this "fun" took place on CentOS 5.5 x86_64 / Interbase version LI-V7.5.1.80)