On servers, hosting and other things
Last weekend I was quite busy. Busy waiting. The main Zealus.com server went down some time around 10pm Thursday. At least that’s when we noticed. I went to our web hoster’s help desk to file the ticket, but the helpdesk was down too. So, it would seem, was the e-mail of the web hoster. Then I went to WebHostingTalk (WHT) – the popular forums, where most hosters and their direct clients hang out while datacenter reboots their servers.
The thread on the provider being down has already reached 4 pages when I joined in. Looked like all the servers managed by the same company went down. People were upset, angry and aggravated.
The guy who was answering my e-mails sounded very apologetic – little recourse knowing that our main site is down. Thankfully most of our clients were on another server, so they were only affected in a way that their communications with us was temporarily impeded.
Around 10:45 AM on Friday I found out that the issue seem to be at the data center, not the servers. Our web hoster informed me that they are changing providers. The reason was not, however, clearly communicated, although several people on forums had asked. I was assured that “we expect everything to be working today anytime soon”.
Knowing how annoying the e-mails could be when you really have nothing new to report, the next time I contacted the hoster was around 3:45 PM. Again, I was assured that “Some of our IPs are already up, your server should be up soon”. Future just looked a little brighter.
At 5:35 PM our hoster announced on WHT that most of the servers are up. However, none of the WHT members save one confirmed that. Their servers, as well as our, remained down. By 11:00 PM I inquired (on WHT) what servers have been brought up, so myself and other WHT members could see that the progress have been made. No response to that, though. Just regular “go back to your tickets, we’ll update you there”.
Saturday, 1:01 PM – another inquiry. The hoster replies that “we only have few machines down left, yours is one of them”. Great news! Not only I got in the game last, I’m about to leave the game last. However, by 3:45 PM hoster posts on WHT that there is another problem with some of the servers that refuse to boot. What happened to “you will be updated in your tickets”?
At 4:00 PM I am promised that “By midnight CST we should have your HDD in new hardware and sites online”. Great, it’s only 1:00 AM by EST, so I can cut on sleep again, tomorrow’s Sunday anyway. However, the server was up at around 4:00 AM with cPanel licenses and other minor things yet to be resolved. The whole thing seem operational as of Sunday, the e-mails started to come in and the rest of the dust eventually settled down.
Now, a bit of reflection. The team seems nice and the tickets are responded to properly. However, the constant lack of communication suggests that in critical cases like this one we will not be able to rely on the responses of the hoster’s team. A lot of questions went unanswered, for example – we still have no idea why hard drive from our server failed to boot in others. Why the system failed to boot at all in the first place. What was the reason the servers were down for more then 48 hours? And, ultimately, why have all these questions went unasnwered while they were asked? For example – I explicitly asked for the reason the system didn’t boot three times – and all three times my question was ignored.
By this time the quality of service doesn’t matter much, as well as price and server features. You can get roughly same deal from various hosters, give or take. What matters – is the quality of communication, the quality of customer service, the responsiveness of the whole team – server admins, customer service reps, techicians and even billing. So I don’t even count the past events, I am not saying I am upset because I had to ask for root password 4 times because they kept changing it right after I asked. I am not saying I am upset because I have found several high-traffic web sites left on my server, although it was supposed to be clean install. What I am upset about is that people I trusted with my client’s data and services failed to deliver up to the promise – several times. This – not the downtime – is the reason for leaving.