Server Watchdog (11/23/13)

If it's official, it's found right here.

Server Watchdog (11/23/13)

Postby MSuLL » Sat Nov 23, 2013 3:41 pm

Our dear servers have been a little... unstable... over the past year. Most commonly, the process falls over without actually exiting, which makes the auto-restart mechanism not notice that something has broken. Today I've implemented a robust checking mechanism that will hopefully solve that problem once and for all!

Every 10 seconds, both servers are checked to see if they are still online and capable of hosting players. If either is found not to be working, they are given a minute grace period (which accounts for things like the server changing maps). After this grace period expires, if the server is still found to be having issues, it is remotely logged in to and restarted.

What this means for players:
The servers won't be down anymore... ever! (I hope...)

What this means for admins:
Making changes to the servers will require a bit of extra care, because if you take them down and then work on the config files, etc, then the watchdog will attempt to boot them back up for you automatically. Instead, stage your files and config changes, then reboot once you've got everything where you want it. That way you won't have to fight with the watchdog.

Please reply here if you see any issues or unexpected behavior. Thanks!
Image
This signiture is 100% awesome and 100% low~tide's creation. That's 200%, fool!

[wH]MSuLL - Head Administrator - WhartHog's PigPen
WhartHog's PigPen UT Server - 208.71.112.196:7777 - Click Here To Play NOW!
User avatar
MSuLL
In-House Nerd
 
Posts: 1207
Joined: Thu Jun 30, 2005 12:18 pm
Location: Iowa, USA

Re: Server Watchdog (11/23/13)

Postby xero » Sat Nov 23, 2013 7:11 pm

Well, isn't this a pleasant surprise! Great to hear things are getting some attention Sull. The servers missed you tremendously and welcome back. In playing today I did notice the server was much more stable, so the watchdog restart wasn't necessary. Actually played a bunch of maps today without a crash, so whatever you did, thank you very much sir.
User avatar
xero
Truly Dedicated Member
 
Posts: 1001
Joined: Thu Nov 02, 2006 10:17 pm

Re: Server Watchdog (11/23/13)

Postby MSuLL » Sat Nov 23, 2013 7:56 pm

xero wrote:Well, isn't this a pleasant surprise! Great to hear things are getting some attention Sull. The servers missed you tremendously and welcome back. In playing today I did notice the server was much more stable, so the watchdog restart wasn't necessary. Actually played a bunch of maps today without a crash, so whatever you did, thank you very much sir.


All luck, I didn't change any server settings or config, all I did was add the watchdog to check for crashes. But still... good deal!
Image
This signiture is 100% awesome and 100% low~tide's creation. That's 200%, fool!

[wH]MSuLL - Head Administrator - WhartHog's PigPen
WhartHog's PigPen UT Server - 208.71.112.196:7777 - Click Here To Play NOW!
User avatar
MSuLL
In-House Nerd
 
Posts: 1207
Joined: Thu Jun 30, 2005 12:18 pm
Location: Iowa, USA

Re: Server Watchdog (11/23/13)

Postby Mikolaj » Sun Dec 01, 2013 6:36 pm

I worry this not fix the problem, both servers since few hours down :(
Attachments
servers down.JPG
User avatar
Mikolaj
Elite WhartHog Member
 
Posts: 396
Joined: Sun Apr 03, 2011 2:31 am
Location: Poland

Re: Server Watchdog (11/23/13)

Postby MSuLL » Sun Dec 01, 2013 6:44 pm

Usually the UT server process is what dies (which my watchdog can fix), but in this case the operating system itself is down, which I do not control. We're waiting on the host to restart the box.
Image
This signiture is 100% awesome and 100% low~tide's creation. That's 200%, fool!

[wH]MSuLL - Head Administrator - WhartHog's PigPen
WhartHog's PigPen UT Server - 208.71.112.196:7777 - Click Here To Play NOW!
User avatar
MSuLL
In-House Nerd
 
Posts: 1207
Joined: Thu Jun 30, 2005 12:18 pm
Location: Iowa, USA

Re: Server Watchdog (11/23/13)

Postby Mikolaj » Mon Dec 02, 2013 3:01 am

[quote="MSuLL"]Usually the UT server process is what dies (which my watchdog can fix), but in this case the operating system itself is down, which I do not control. We're waiting on the host to restart the box.[/quote]

Roger that
User avatar
Mikolaj
Elite WhartHog Member
 
Posts: 396
Joined: Sun Apr 03, 2011 2:31 am
Location: Poland

Re: Server Watchdog (11/23/13)

Postby xero » Mon Dec 02, 2013 7:14 am

MSuLL wrote:
> Usually the UT server process is what dies (which my watchdog can fix), but in this
> case the operating system itself is down, which I do not control. We're waiting
> on the host to restart the box.

Maybe they got lost on the way to the server room?!
User avatar
xero
Truly Dedicated Member
 
Posts: 1001
Joined: Thu Nov 02, 2006 10:17 pm

Re: Server Watchdog (11/23/13)

Postby <3MUCH> » Mon Dec 02, 2013 7:51 am

[quote="xero"]MSuLL wrote:
> Usually the UT server process is what dies (which my watchdog can fix), but in this
> case the operating system itself is down, which I do not control. We're waiting
> on the host to restart the box.

Maybe they got lost on the way to the server room?![/quote]


Lol...A few indian IT techs were found wandering i the vicinity..
User avatar
<3MUCH>
Truly Dedicated Member
 
Posts: 799
Joined: Thu Apr 22, 2010 6:49 am
Location: Toronto , Canada

Re: Server Watchdog (11/23/13)

Postby Alexei » Sun Dec 15, 2013 8:08 pm

MSuLL wrote:Our dear servers have been a little... unstable... over the past year. Most commonly, the process falls over without actually exiting, which makes the auto-restart mechanism not notice that something has broken. Today I've implemented a robust checking mechanism that will hopefully solve that problem once and for all!

Every 10 seconds, both servers are checked to see if they are still online and capable of hosting players. If either is found not to be working, they are given a minute grace period (which accounts for things like the server changing maps). After this grace period expires, if the server is still found to be having issues, it is remotely logged in to and restarted.

What this means for players:
The servers won't be down anymore... ever! (I hope...)

What this means for admins:
Making changes to the servers will require a bit of extra care, because if you take them down and then work on the config files, etc, then the watchdog will attempt to boot them back up for you automatically. Instead, stage your files and config changes, then reboot once you've got everything where you want it. That way you won't have to fight with the watchdog.

Please reply here if you see any issues or unexpected behavior. Thanks!


Can we make a private testing server that will just mirror everything from the other two (except of course port) and could be brought up to test config changes for admins? That way there's a testing environment for changes in case things go poorly...
If a man is not a communist at 20, he has no heart, but if he is still a communist at 40, he has no head.
Image
User avatar
Alexei
Truly Dedicated Member
 
Posts: 2412
Joined: Fri Jul 15, 2005 6:51 am

Re: Server Watchdog (11/23/13)

Postby MSuLL » Mon Dec 16, 2013 10:09 am

Alexei wrote:Can we make a private testing server that will just mirror everything from the other two (except of course port) and could be brought up to test config changes for admins? That way there's a testing environment for changes in case things go poorly...

I don't have any resources around right now to be able to do that, but if nothing else, we do at least have two servers, so having one bork for a day isn't the end of the world.
Image
This signiture is 100% awesome and 100% low~tide's creation. That's 200%, fool!

[wH]MSuLL - Head Administrator - WhartHog's PigPen
WhartHog's PigPen UT Server - 208.71.112.196:7777 - Click Here To Play NOW!
User avatar
MSuLL
In-House Nerd
 
Posts: 1207
Joined: Thu Jun 30, 2005 12:18 pm
Location: Iowa, USA


Return to News

Who is online

Users browsing this forum: No registered users and 1 guest

cron