On April 24th we had a brief interruption on one of our backbone connections that made it appear as if Winhost had dropped off the map.
That interruption, outage, glitch or whatever you want to call it, raised a lot of questions that I thought I could use this opportunity to answer.
Every data center is connected to the Internet through high capacity connections called backbone connections. The “backbone” of the Internet is a group of high capacity providers called tier 1 providers.
Tier 1 providers are pretty reliable, they have to be or the Internet wouldn’t work. But they still have problems from time to time. A cut fiber on a construction site, a natural disaster or power outage, someone flipping the wrong switch – all of these things can cause an outage on a backbone connection.
We do. We have two backbone connections to our servers, provided by different companies. Normally the traffic in and out of the servers is balanced between those two connections using a number of network analyzing tools and a lot of routers and switches.
So if one connection is dropped, everyone whose traffic has been routed through that connection is cut off. The other half of the traffic, coming in on the other backbone connection, doesn’t experience a problem. That’s what happened on the 24th.
If there was an extended outage on one of the connections we could switch all traffic to the working connection. Making that switch (and then switching back when the problem is solved) is not a trivial matter though, so we wouldn’t do it unless we anticipated a long outage on the connection that was down.
A long outage on a backbone connection is rare though, so rerouting all the traffic is usually unnecessary.
Anyone affected by the outage wouldn’t be able to see our site or the forum, since they can’t access anything on our network.
We reacted and responded on Google Plus, Twitter and Facebook, which is probably more effective than an outage post somewhere on our site or on a status site somewhere (that no one knows how to get to).
Things like this are part and parcel of life on the Internet. Any provider who tells you they can host your site and there will never be an outage of any kind isn’t telling you the truth. All of these things (even the mighty, mystical cloud) run on hardware. And hardware is just machines and machines don’t run perpetually without problems.
When they invent machines that do run forever without problems, we’ll be first in line to buy them. I can guarantee that. 😉
Until then, we’ll continue to provide the best service your money can buy, and be open and honest about actual and potential problems.
Couldn’t multiple datacenters hosting my site have prevented the outage? You could then use hardware load balancers which understand how to detect datacenter outages to redirect traffic to the online datacenter. I realize that probably doubles the cost of running my site and adds complexity syncing changes between multiple server and for me that wouldn’t be worth it.
It’s possible that hosting in multiple datacenters may have prevented the outage but you are right, the costs for that type of implementation would be a LOT LOT more than $4.95/mo.
And you can see that there are a bunch of these Cloud Hosting services out there that say they have multiple datacenters around the world, but they still have major outages taking down many huge sites.
We are trying to provide the best service we can at the most affordable prices. Are there going to be issues? Sure, we are not perfect and we don’t pretend to be – but issues like this one is not an everyday occurance.
You might imagine that the Pentagon has pretty reliable servers. After all, they have their own power plant next door. I once had a conversation with the contractor lead on the Pentagon’s servers. He quoted a reliability of about 99.9999%. In other words, even the Pentagon doesn’t expect 100% uptime.
Well, there you go. Perfect example of (slightly) less than 100% being adequate.
100% uptime is attainable, but at a cost and complexity that makes it unrealistic for just about every site or web-based app on earth.
I’m not sure how the idea came about in hosting that 100% uptime was feasible. It’s more than a little unscrupulous to advertise it. But I suppose some places still do.