Posts Tagged ‘up time’



Patches

Friday, September 25th, 2009

Meet Patches. Keep him healthy and he’ll be with you a long time. Look at that face! Knowing the anxiety Patches suffers when going to the vet, do you religiously take him every time there’s a new medicine on the market, just in case he might catch some exotic bug he has .001% chance of contracting?  No, most likely not.  But you do take him to the vet for regular shots to prevent things dogs of his kind are likely to have problems with – a regular maintenance visit, you might say.

our-dog-patches

Why it is worth the cost and commotion of going for the maintenance visit, but not every time a new vaccine or pill is announced?  Because the cost/benefit equation is right for one and not the other.

A lot can be learned from Patches about the discipline of patching servers. We are occasionally asked “How often should I patch my servers?” and we get into discussions with a wide variety of customers with widely differing views on the subject. Often though, we find that it largely boils down to one’s view of the world – is your glass half empty or half full? Certainly, we need to keep systems patched to at least the minimum level supported by our software vendors, but given the cost and commotion (dare I say trauma) of the patching process, how far beyond that is necessary or prudent? If you have Internet facing assets, then clearly you want to keep those up to date with the latest security patches as soon as they’re available. But if you have private, stable, non-web assets well behind well-managed firewalls, a less rigorous approach is reasonable. There is no need or rational justification to blindly apply a patch willy-nilly simply because it’s available. Who has not been the victim of downtime because an ill-behaved patch did something that it was not supposed to do? And, lest we forget, rebooting a Windows server after patching is not always a trivial event – just ask the sysadmin of a Blackberry Enterprise server.

Remember the purpose of infrastructure is to keep running – the very namesake of this blog.  Our infrastructure does us no good when it’s down. Every patch brings with it the some level of risk to uptime. So, the obvious thing to do would be to test every patch before we apply it to a product system. Do you? Really? Every time? Or is easier to just apply the latest raft of fixes from say, Microsoft, and just hope for the best? For those of us who have to endure the regular water-boarding process of a SAS 70 Type II audit, hope is not a strategy. Not only do we have to test every patch before applying it to a live system, but we also have to prove that we did so, and that we have a defined process that meets the muster of the auditors.

agentsmith3

This process of patching is costly in terms of time, money, and risk. So how often should we patch? Somewhere between hope and SAS 70 lies the right answer for most of us. Like maintaining your car, regular maintenance of a server is necessary to keep a system “on the road.”  This many mean spending time regularly (like an oil change) researching patches to see which of them you really need as opposed to those that make you feel warm and fuzzy, and then testing appropriately first. On the other hand, regular maintenance may not imply regular patching. If a system needs to be running the latest Windows server OS, or the application vendor forces your hand, then you will certainly be patching more often. If on the other hand, you have a functionally stable system that doesn’t change much, has been running well and isn’t the flagship of your ecommerce empire, then you will probably patch extremely infrequently if ever, and that’s OK. We’ve got a Red Hat 7.2 system here that sees heavy daily usage, has not been patched in years, and has not been hacked or had any problems over that same span of time. Sacrilege? Perhaps, but we believe it’s prudence. It could also be Pennsylvania Dutch stubbornness.

You do need a patching process, but it should reflect you particular situation and account for the nature of each of your servers. Like the Pirate Code, best practices in this arena are more like guidelines. You can spend a lot of money and create a lot of headaches with a one-size-fits-all approach.  A socialist patching approach sounds good on paper, but as you would expect with anything socialistic, it tends not to work out well in reality.

pirates-guidelines-cover-we

Weigh the risks and cost of downtime vs. the potential benefit of a patch.  Part of your process should include a justification phase where IT and business stakeholders have an opportunity to understand what is being patched, why it’s been deemed necessary, and what the possible ramifications are if things go awry. And, most importantly, the stakeholders should have both veto power and the power to determine the scheduling of patch activity.

Patching is a necessary evil, but it is manageable if you take the time to think through the process and come up with a practical plan that fits your business. Or, you could simply delegate the process to folks who know how to both open and close Pandora’s box.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

The Truth About Up Time

Friday, July 3rd, 2009

On June 29th a cloud burst occurred at Rackspace, proving that even the mighty eventually do fall. The blow-by-blow Rackspace Twitter account of their power outage provides interesting insight into what happens during a crisis at a hosting provider.

42-15823054

In every industry there are dirty little secrets that customers either don’t know about, or don’t want to know about. The meat counter at the grocery store is a prime example. Those steaks and chops look really good, but did you every watch the entire process from hoof to hamburger? It’s not pretty, and for most folks it’s Too Much Information.

So here’s Dirty Little Secret #1 of the hosting industry:  While most every hosting company has to make the claim in order to be credible, no one can deliver 100%  data center up time forever. No one. Not even the market leader.  So why then make the claim at all?  Because that’s what customers demand to hear. In talking with customers we find a widespread cross-industry sentiment, usually absent of any logical rational,  that says “my business is so important that my infrastructure has to be running 24/7 without any interruptions at all.” Unless your business is keeping patients alive with sophisticated medical equipment,  this seems like a rather difficult position to defend.  But no one wants to be the bad guy to point that out.  We know there is life beyond brief outages because they happen every day and yet nobody goes broke, but it is typically unwise to say so.

Realizing that downtime will occur, even in the elite shops of the world like Rackspace with their fleet of nine data centers,  you do need to make realistic decisions about what level of  up time you really need in light of the type of business you’re in.  And while it may sound like heresy, you also want to make decisions about things that are much more important than up time levels. It seems to me that if downtime is inevitable, and we know that it is, then I want my equipment in the hands of people who know how to recover quickly from an outage, who will communicate with me regularly and truthfully throughout the crisis, and who will do their level best to get me back on line as quickly as possible.  I want my equipment in the hands of highly competent people that I can trust. You can’t make that determination when you sign up for service via a web browser or where you do the whole transaction over the phone. The only way to make the determination is to actually meet the people who are going to become the custodians of your infrastructure.

Before you put your equipment in the hands of someone else, make the effort to visit them.  If they don’t allow visits, that should be a big Red Flag #1. Talk to their operations and support people, particularly the folks who will be touching your equipment. If you’re not allowed to talk them, that should be Red Flag #2. Ask them about their up time guarantee.  If they look at you square in the eye and say 100%, that should be Red Flag #3. Kick the dust out of your shoes and move on.

Let me cordially invite you to visit our data center hosting facility this summer.  No red flags – just trustworthy, highly competent, and dependable people.

//spk

p.s. Happy 4th of July!

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.