I was checking the Winhost mailbag this morning when I came across this letter:
Your new website is so pretty. Why on earth does your blog continue to be such a monstrous eyesore?
Patricia Cardingiff Baxbauer”
Actually, we couldn’t agree more. If you would be so kind, please allow us a moment to adjust…
Just a little bit more…and…almost…
How’s this? Better?
Your pals at Winhost
From your desktop to our servers there are a lot of technical issues that can potentially have a negative impact on your web site. What we would like to talk about in this article are interruptions or outages related to our servers or network. The things that happen on our end of the wire.
Most web hosts avoid discussing these kinds of things, which is understandable. No one wants to draw attention to an unpleasant aspect of something they are trying to sell to you. But whenever an outage occurs many of you will ask us, “What are you going to do to make sure this never happens again?” We think that’s a reasonable question, and one that deserves an honest and detailed answer.
There are essentially three different forms these outages take; maintenance and upgrades, server-specific problems, and provider problems or malicious attacks. What we can do to alleviate or prevent the problems varies depending on which type of outage we’re talking about.
Since the WinHost platform is primarily made up of servers running the Windows operating system, a certain amount of downtime for maintenance and upgrades is unavoidable in order to maintain security and provide you with current technology.
We do planned maintenance (when necessary) on Wednesdays, and a general Windows update every month. There is also occasional unplanned maintenance, which is usually an update or fix for a security issue or a problem that is having an immediate negative effect on a group of servers, so the fix is made outside the normal maintenance window.
Our servers are consistent across the entire network, so for example, all mail servers have the same configuration, all SQL 2008 servers are the same, all SQL 2012 servers are the same, etc.
However, all of the users on the servers are different, and the number of users per server varies, so even though all Windows 2012 web servers have the same configuration, they can experience different problems.
Additionally, all servers run on hardware, and all hardware is susceptible to component failure. We use only top-of-the-line Dell servers, but no matter how much you pay for them, electronic and mechanical parts still fail. Using virtual servers (which we do in some cases) reduces the likelihood of mechanical failure to a certain extent, but virtual servers still run on physical machines.
So while server-specific problems are bound to happen occasionally, we do a few things to prevent unnecessary issues, such as extensive monitoring of the live servers (via giant monitors in the support and system administration offices, and immediate text messaging to all of the system administrators telephones), and controlling and balancing density, so that the servers always have roughly similar numbers of users and loads.
We also retire all hardware that is past a certain age. So your site, database or email will never be running on a 10 year old server that everyone expects to fail at any minute.
When a specific server fails or is showing signs of stress, our administrators take action immediately, and make things right as quickly as they can.
I should take this opportunity to let you know that every member of the WinHost staff is local, and we all work out of offices in the same building here in Los Angeles. We do not employ remote staff or offshore third-party staff. Communication is easy and efficient in the event of a problem.
These are usually the most severe of the issues we’re discussing, having an impact on the largest number of users. Unfortunately they are also typically the most difficult to deal with.
A provider problem usually means an issue with one of our Internet backbone connection providers. These are the companies (Internap and Savvis) that provide the connection from the routers in our data center to the Internet backbone.
It is not technically necessary to have multiple connections, but because these giant providers are not perfect (and they need to do maintenance and repairs sometimes too), we have multiple connections to prevent a complete network blackout due to provider issues. We split and balance traffic between the two, and most of the time, all is well.
Then there are the other times.
When one of our two connections goes down unexpectedly, we recalibrate and direct traffic to the connection that is working and do everything we can to insure that all incoming traffic can be accommodated.
The problem is, when a backbone connection goes down, some traffic is not going to route around it properly. It’s certainly supposed to. The Internet was built on the theory that traffic would easily and automatically route itself around unresponsive nodes. But in actual practice, things don’t always work the way they do in theory.
Long story short, if your connection doesn’t route around the dead backbone connection, there won’t be anything we can do on our end to remedy the situation.
That doesn’t happen often and it doesn’t affect everyone. But it will inevitably affect some of you, and if it does happen, from your perspective everything will be down. Everything may in fact be up, but anyone whose request is stopping at the dead backbone connection won’t be able to access our network.
I should also mention that it is theoretically possible for both providers to go down at the same time. The odds of that happening are very slim, but it is possible.
You might wonder, “Well, if that’s a possibility, why not have a dozen backbone connections?” and that’s a reasonable thing to wonder. But the cost for even one additional backbone connection would not come close to the potential, “freak-occurrence” benefit. Not to mention the fact that it wouldn’t improve your service on a day to day basis. We would just be sitting on top of (and paying for) a lot of idle, unused bandwidth.
Speaking of bandwidth, I saved DDoS for last.
DDoS is an acronym for distributed denial of service attack. They are brute force attacks that send so much data to a site or server that they effectively “knock it off line.”
The truth of the matter is we cannot prevent a large DDoS attack because we can never know what might trigger a large DDoS attack.
In the old days, a very large DDoS would throw hundreds of megabits of data at a site every second – or at the most, a gigabit (Gbit/s) – and that was usually enough to take it down. But providers got wise and started using DDoS mitigation services (like we do) that temporarily provide huge amounts of bandwidth which make a one or two Gbit/s DDoS ineffective.
But now with exponentially larger and higher bandwidth botnets, the attacks can be so large that we can’t even measure how much traffic the DDoS is sending. 10 Gbit/s hardware switches are saturated and paralyzed, and even larger switches that are used by the backbone providers are swamped and slow to a crawl (which is how a recent DDoS on a WinHost customer slowed down traffic for part of Yahoo!).
Since the methods we used to deal with DDoS in the past are no longer effective, what we do now is try to determine the target site or sites and remove them from our network (by null routing the IP address), then wait for the DDoS to taper off after the site disappears.
In order to locate the target site, our system administrators coordinate with our upstream providers to get the necessary target IP information. Once they have that, they start going through that IP manually, site by site, looking for something that might attract a DDoS.
That is as unscientific as it sounds, and as a result, locating the target can take anywhere from 30 minutes to several hours. And unfortunately, even after we have identified the target, chances are it will continue to affect a number of sites even after the DDoS has ended, because the IP address those sites live on has temporarily been removed from our network.
I’ll wrap this up by saying that while it isn’t possible for us to prevent outages completely, we are always working on improving our communication when they do occur. Every outage is different (if they were all the same, this would be easy) and they each teach us something new.
But if you need quick information, and the outage is server-specific (and not generally crippling our network), your best bet is to check the forum for updates. The forum is the first line of public communication for the tech support staff, so it is typically the first place you’ll see information.
For outages that do affect our network, we post information on Twitter, Google+ and Facebook. We may not be able to respond to your questions or comments on social media while the outage is in progress, but we do our best to keep those sources updated.
We do not have a “social media team,” as some larger companies do, so despite our best efforts, you may see spotty updates on any given service during any given outage. That doesn’t mean we don’t care. It usually means the outage is happening at a time of day when we are not all in the office, so the people who are here are dealing with the outage and answering helpdesk tickets (and haven’t called to wake me up yet).
We’ll always do our best to keep you in the loop so you know what’s happening. If you have any suggestions you’d like us to consider, we encourage you to comment on this post and let us know.
First, if you have a large number of sites in an account, we have significantly sped up the
Order New Site and Order New Domain Name functions. Now there is no need to go make a sandwich while those pages load. Though, if you think about it, it’s almost always a good time for a sandwich.
Next, if you’ve ever entered a DNS TXT record in Control Panel, you may have run into a 128 character limitation. We have increased the TXT record limit to 512 characters (the maximum the DNS system will accept).
Finally, if you like to mess around with DNS records in general (and really, who doesn’t?), there may have been a time when you thought, “Well, that was fun, but I wish I could just dump all these cool experiments that have made my site redirect to altavista.com and somehow caused my email forward to the White House and just start over with a clean slate…” Well, now there is a Reset DNS button that does just what it claims to do – resets the DNS record for the site to our default settings. It’s cool, it’s powerful, and it will completely remove any customizations you’ve ever made, so use it carefully.
That’s it for now, but we’re always hard at work over here making the world a better place, so let us know if there is something we can do just for you.
On November 11th, 2013 we are introducing some plan changes. Check your email for details. To summarize what’s happening:
These increases will take place automatically on November 11th.
In order to make the plan changes, we are updating the pricing and payment options. You can find details in the plan change email.
While the quarterly and yearly payments will increase slightly, we have introduced a new two-year payment option that allows you to pay as little as $3.95 per month for the Basic plan. Even less than the old price.
The two-year payment option applies to every WinHost plan, so no matter which plan you use, you can decrease your monthly price by up to 20% by choosing the new two-year payment option.
And you can make the switch right now in Control Panel, before the new quotas and pricing go into effect on November 11th.
As we mentioned in the email, the price increase was a difficult decision for us. We never want to increase prices, but we want to continue to make improvements to the hosting services, and this will allow us to do that.
We hope the two-year payment option – which actually lowers current prices – and the quota increase will help ease the transition for some of you.
On April 24th we had a brief interruption on one of our backbone connections that made it appear as if WinHost had dropped off the map.
That interruption, outage, glitch or whatever you want to call it, raised a lot of questions that I thought I could use this opportunity to answer.
Every data center is connected to the Internet through high capacity connections called backbone connections. The “backbone” of the Internet is a group of high capacity providers called tier 1 providers.
Tier 1 providers are pretty reliable, they have to be or the Internet wouldn’t work. But they still have problems from time to time. A cut fiber on a construction site, a natural disaster or power outage, someone flipping the wrong switch – all of these things can cause an outage on a backbone connection.
We do. We have two backbone connections to our servers, provided by different companies. Normally the traffic in and out of the servers is balanced between those two connections using a number of network analyzing tools and a lot of routers and switches.
So if one connection is dropped, everyone whose traffic has been routed through that connection is cut off. The other half of the traffic, coming in on the other backbone connection, doesn’t experience a problem. That’s what happened on the 24th.
If there was an extended outage on one of the connections we could switch all traffic to the working connection. Making that switch (and then switching back when the problem is solved) is not a trivial matter though, so we wouldn’t do it unless we anticipated a long outage on the connection that was down.
A long outage on a backbone connection is rare though, so rerouting all the traffic is usually unnecessary.
Anyone affected by the outage wouldn’t be able to see our site or the forum, since they can’t access anything on our network.
We reacted and responded on Google Plus, Twitter and Facebook, which is probably more effective than an outage post somewhere on our site or on a status site somewhere (that no one knows how to get to).
Things like this are part and parcel of life on the Internet. Any provider who tells you they can host your site and there will never be an outage of any kind isn’t telling you the truth. All of these things (even the mighty, mystical cloud) run on hardware. And hardware is just machines and machines don’t run perpetually without problems.
When they invent machines that do run forever without problems, we’ll be first in line to buy them. I can guarantee that.
Until then, we’ll continue to provide the best service your money can buy, and be open and honest about actual and potential problems.
2012 has been a great year for us, and we’re glad that you have been along for the ride.
As I mentioned a couple of weeks ago, we have some exciting things on the horizon for 2013, and we look forward to bringing you cool new products and services, and continuing to be your hosting provider of choice!
Because we know that you do have a choice – lots of them, in fact – when it comes to hosting, and we work hard to be the best choice. The only choice!
Thanks again for a great year, and here’s to 2013.
You may remember that last year around this time we reported that we placed fourth in the DevProConnections Community Choice Awards.
Well I’m proud to announce that this year we placed – fourth!
And just like last year, we’ll take it, since the top three are the same behemoths that eclipsed us last year (Amazon, GoDaddy and DiscountASP.NET).
We feel pretty good about placing above Rackspace, another monster host that could lose more customers than we have and not even notice it.
But next year – you’ll see – we’re going to crack that top three!
We’re working hard on some cool new stuff (if I tell you about it now, they’ll fire me), but suffice it to say that we’re always working to make WinHost the world’s best hosting platform, and the kind of place where your sites can be proud to live.
Here’s to fourth place, and nipping at the heels of the giants!
I don’t know who put them there, but there are several desks and office chairs taking up parking spaces in the underground garage here at the office.
I assume they are left over from a bank that recently vacated, but I have to say that I’ve worked for hosts in the past that would see this as a perfectly viable alternative “work space.”
In fact, it would have been a step up in some of the places I’ve worked.
I’m not naming names, but if you worked in the hosting industry in Los Angeles in the decade from 1996 to 2006, you probably know the two hosts I’m talking about.
(Their initials were Affinity and PowWeb.)
Last year you turned out in force and voted for us in the DevProConnections Community Choice Awards, and we placed fourth behind some industry heavyweights!
Well guess what? We are nominated again! And it would be great for this still scrappy upstart to show the big boys what we’re made of. If you have a couple of minutes to spare, we would appreciate your vote!
We’re listed under 10. Hosting Service. Voting closes on Friday the 28th, so please pop on over there as soon as you have a chance.
It would be quite a feat to move up in those rankings considering the competition, but anything is possible, right? Vote!
And thank you for your time, your support, and most of all, thanks for using WinHost!