We have finally come to understand the issues around Thursday and Fridays downtime. Firstly I would like to assure you that we do take these issues very seriously, not only do we have contingencies in place but we also run frequent drills and simulations to test our systems. However on this occasion a series of events which had not previously been thought up, lead to our systems being unavailable for 13 hours 49 minutes for over 40% of our SolarHost clients.

So what happened? A series of events lead to our http services being unavailable:

At 2334 hrs BST a lightening strike near our data center caused a power surge in our systems. The UPS kicked in and within 15 minutes power was restored. However the power surge fried the circuits on our live routers.

By 2345 hours our technician was on site evaluating and rebooting the hardware. Diagnostics results showed issues with the routers and load balancers.

By 0200 hrs new routers had been installed, the hardware was already on site and installation was completed swiftly - this should have solved our issues.

We expected some latency in the process of restoring the routers, however by 0230hrs it was clear that further problems were afoot. Further diagnostics were conducted but the issue was hard to diagnose, the server was live, the routers were responding and the rest of the DC had recovered fully, yet our server was not responding to http requests. We switched to a router elsewhere in the DC to check if this was the issue, a router known to be working, but the problems persisted.

By the early morning the issue shad become clear, in the confusion and speedy recovery one issue had been overlooked, our IP configuration had changed to a backup IP and DNS system, this is typical in such a restoration. However our cPanel license is IP specific and the new IP was throwing the cPanel license server. cPanel itself serves all http data and because cPanel thought our new IP was unlicensed it was disabled.

A stream of phone conversations and emails later and the license was updated, only to then find that the backlog of data, emails and pings resulted in our firewalls locking down the entire server once again. We restored a backup of the server configuration remotely and by 1300hrs, the server was responding to http requests, with imap and pop following 15 minutes later.

Overall we are very disappointed with our performance on this occasion. Not only did our third party suppliers fail to identify the issues at fault here, but our internal measures failed to diagnose the issue. too much time was spent on the hardware originally thought to be at fault that the wider issue was not picked up until 9 hours had passed.

We have now set in place measures to prevent this including additional server licenses for our backup IP ranges and also an enhances SLA with our providers who were not able to respond as fast as we would have hoped. We are also investigating the potential implementation of a backup server which we could fall back to in such instances, however such setups involve considerable investment on our part and we must be cautious not to increase our prices further. Had the license issue not been present our downtime would have been just 2 hours - the time taken for new hardware to be sourced, configured, installed, tested and made live.

I would like to reassure you that we are very passionate about our services, this record will remain public for all current and future customers to see, both in our customer portal news, and in our public up-time report at http://status.solarhost.co.uk - we believe in honesty and transparency. We are not perfect but hope that by learning from our experiences and working together with our customers we can built a future proof, sustainable and eco-friendly system. It is worth adding that at no point during this downtime was any data vulnerable or at risk. The issues were entirely restricted to connectivity and licensing.

I truly hope you can accept my sincerest apologies for letting you down on this occasion, it simply wasn't good enough. I hope my explanation is comprehensive and transparent and I would welcome any comments you may have. I hope that you will allow us the opportunity to restore your faith in us over the coming months and ultimately that we can become the premiere eco-friendly hosting platform in the UK.

Best Wishes

Tom
Simple Hosting Solutions



Tuesday, July 12, 2011







« Back

nominet terms and conditions apply   We accept Visa, Mastercard, Maestro and PayPal Payments   members of nominet