Archive for July, 2009



Sneaky Savings

Friday, July 24th, 2009

If you read the trade rags and blogs for too long, you can easily lose your grip on reality and feel you like your IT shop has fallen into medieval times. Take server virtualization, for example. VMware is now pitching the fourth generation of it’s product line and The Prophets proclaimed long ago that we’d be in a Golden Age of Virtualization era by next year.  Have you virtualized your shop to any great degree or are you still just dabbling? If you’re in the latter category, it seems you are still very much in the mainstream.  Metrics Based Assessments LLC (MBA) publishes quite a number of useful statistics gathered from real IT shops, as opposed to the utopian shops where The Prophets dwell.  Have a look at this graph:

imagesperserver1

Quite surprising, isn’t it? Before you cry blasphemy, read the disclaimer from that comes with the graph:

We realize that many readers of this e-mail are going to say our level of virtualization (images per server) is significantly higher than MBA’s.  That may be true for the servers that you have virtualized.  We obtain our average by dividing the number of images for all servers in a platform by the number of servers supporting the platform. Our best participant for each server platform averages approximately 2.5 images per servers as shown below:

Windows – 2.57
UNIX – 2.46
Linux – 2.52.

Indeed, our own shop is much higher than this, 6.63 to be exact, but it’s important to note that they’re throwing in the non-virtualized servers still on the floor, which one could easily be tempted to overlook. Clearly, there are a lot of shops out there that haven’t yet ventured into the deep end of the virtualization pool. It doesn’t look like the world is going to reach Gartnerian Nivrana on schedule, but there does seem to be compelling evidence to continue moving in that direction. Consider these numbers, also from MBA:

serverimagecost

From 2007 to 2008, the percent decrease in the average cost per server image was:

Windows – 8.8%
UNIX – 8.1%
Linux – 10.2%

Our current best participant for each server platform is:

Windows – $9,031
UNIX – $13,965
Linux – $10,827.

This graph makes it quite clear that costs per image are dropping. How so? Unless you just moved your shop to India, your labor costs are the same as last year or more. Facilities costs (power, cooling, space) are going up, not down. Taking a look at the Windows category above, hardware price decreases alone could not have accounted for an 8.8% average decrease per image – it seems like way too much. Chairman Steve hasn’t offered any fire sales on Windows in recent memory, so it can’t be that either.  What gives?  The decreasing cost of hardware is certainly a factor, and we would expect to see the biggest savings in the pay-for Unix world because it’s not running on commodity hardware. Yet regardless of platform, the savings are apparently quite substantial. There has to be something more going on here. Could it be correlated to the rise in virtualization?

The folks at MBA tend to factor in everything when they calculate costs, including the kitchen sink. Software and hardware acquisition costs, facilities costs, the cost to virtualize – it’s all baked into the numbers. Could it be that as the level of virtualization increases, even just fractionally, the amount of annual savings across the enterprise increases significantly?  Looking at our own shop, we have roughly 48 Windows images trundling along. Let’s suppose we had increased our number of images per server just .08 as shown above from 1.27 to 1.35.  Our annual savings would seem to be $81,600.

Hmm, that seems like financial voodoo, you say. Perhaps so, but how do we explain the huge savings?  New hardware avoidance is certainly one way to generate numbers this big, and reduced software costs might also be in play depending on how your software is licensed, both of which are natural outcomes of virtualization.

Without seeing a more detailed breakdown of the numbers it’s hard to be absolutely dogmatic about these savings, but it’s an interesting thought to mull over.  Could there be onging opportunity for significant cost savings by continuing to crank up the virtualization factor a few clicks every year, or is there a practical upper limit?

//spk

p.s. See you in a few weeks.  It’s vacation time!

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

A Pragmatic View of Downtime Cost

Friday, July 17th, 2009

On occasion, our prospective customers will mention they’ve done an extensive study to determine how much a minute of downtime costs their company. Ergo they are visiting with us to establish either a primary or secondary location as part of strategy to lose exactly $0. Sometimes I wonder what Gary Coleman would say if they heard their explanations.

Well said Gary. Terse, but adequate.

There is apparently some very deep magic involved in figuring out the cost of downtime, and no one seems to agree exactly on what the proper incantation should be. A little over a year ago on The Numbers Guy, there was a humorous post of just how ambiguous, and ultimately irrelevant, calculating this number can become.  To wit:

One blog headlined a post, “Amazon’s $3.6 Million Outage?,” noting that if projected second-quarter revenue was spread evenly over time, then the site normally would be making $1.8 million per hour. TechSpot.com and the Seattle Post-Intelligencer performed similar calculations with last year’s revenue to estimate that Amazon lost $29,000 per minute; CNET used last quarter’s results to calculate $31,000 per minute. Then the New York Times, last week, reported that “Amazon, by some estimates, lost more than a million dollars an hour in sales.”

Does it really matter who was right?  It’s A Lot Of Money by anyone’s reckoning, that is, if you can believe the numbers in the first place.

What exactly is the value of doing an extensive study, or even a moderately detailed investigation, given the cost the of meetings hours one would burn doing the analysis vs. the quality of data one could actually expect to produce? Often the variables can become so complex that gut feel and opinion invariably creep into the equation just to get the math done. This in turn results in a baked-in degree of subjectivity that ends up being the source of debate when the numbers are used to justify a business case later on.

Maybe I’m missing something, but there has to be a better way. Usually the only reason we want to know the cost of downtime is to justify the costs necessary to keep the key parts of the infrastructure highly available. It then logically follows that we need to know which parts of our infrastructure really contribute to the top line such that they are truly worthy of being made highly available. With that in mind, let’s ask the questions:

Let’s say gross revenue was $20M last year, and we do business 5 days a week, or 260 days/year. On a simple even spread, that’s $53/minute, or if you like $3,205/hour. Can you make a business decision based on that? No matter how you rig the math (e.g. more heavily weight end of month, etc), it boils down to crazy numbers that look this. How can they possibly help you justify a monthly spend on an availability solution? Does it therefore matter precisely how accurate they are? The above calculation is admittedly simple and perhaps even lame, but I would argue than any other more exotic formula does no better.

I think the more useful exercise to take the time to really understand what your key business systems are (not the discrete elements like servers and routers), and then determine what underlying systems are required to make them go, along with all of the various inter-system dependencies. As obvious as that sounds, our experience has been that folks often do not have this kind of a handle on their infrastructure. Instead of saying to senior management that “our SQL server absolutely has to be up all the time because the business depends on it,” you should be able to say “our SQL server needs to be up because we can’t take orders if it’s down,” or “our SQL server needs to be highly available because we can’t load our trucks when it’s down.” You are much more likely to hear “make it so” on your DR proposal with this approach than if you go in with a story about how much downtime costs the company. Senior management is well aware of the revenue numbers – they don’t need to be reminded, and trying to foist a murky cost of downtime justification on them is an iffy, if not perilous strategy. What they want and need is plain talk on what happens if things break.

Speaking in pragmatic terms the business understands will result in funding for a DR plan that makes sense for the business, though it may not be everything you’d personally like to have. So if you feel you really need synchronous replication rather than asynchronous, you’re going to have to explain in business terms why the extreme extra cost is necessary. Pitch the solution in plain terms. Lay out information meaningful to your leaders and trust them to make a good business decision.

But do influence them to go with a data center that has your operating budget in mind. We can often bring some of the seemingly out-of-reach high availability solutions within your reach.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Personnel DR

Friday, July 10th, 2009

Are you prepared for the departure of a key technical resource in your operation?  Someone who holds the infamous “keys to the kingdom?”   Typically there is a least one person on a company’s IT staff who achieves deity status in regards to physical and logical access. Sometimes key skills also reside in just that one person. If such a person leaves, either voluntarily or involuntarily, how would your critical operations fare?

vista_help_icon_by_thoosje

Now would be a good time to take a fresh look at both your internal documentation and your skills matrix. Things to consider:

  1. Are all sysadmin userids and passwords documented somewhere, somehow?
  2. Are all critical architectures documented in excruciating detail?  (SAN, virtualization, LAN/WAN, disk replication, backup/restore systems). You want to see how things are connected and how they are intended to interact. You want to see things like IP addresses, subnet addressing schemes, WWN numbers, hard and soft zoning information and the like. You’ll know you have all of the information you need when you can hand it to a new engineer and he doesn’t have any questions. Seem impossible? Strive for it, and the result will be good enough.
  3. Where does the above documentation live if you do already have it?  Hopefully it’s not on your staff’s laptops.  If you think you already have it in a shared on-line space, are you sure you have all of it?  And is it being backed up?
  4. Do you have runbooks for all of your servers?  Are they current?  Where are they? Are they backed up?
  5. How many people have practical working knowledge in each area of your critical infrastructure?  Do you have more than one VMWare tech?  More than one SAN person?  How about Active Directory or Exchange? Ideally you’ll want three in each area. Contract for it if you need to.

 
I could go on, but I think you’re getting my point. This process is somewhat like writing a will.  It’s a real drag to write up, and everybody knows that they need to take care it, but yet it often gets ignored until it’s too late. And just like a will, all of this documentation needs to be updated on a regular basis or it may end up being worthless at crisis time.

Alternatively, you could move the responsibility for a large portion of this to a professional hosting facility.   Why not limit your exposure to just your applications and let us worry about how all the  plumbing is hooked up?

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.