Archive for the ‘Data Center’ Category



Got Lightning?

Friday, August 21st, 2009

Here in the east, summer didn’t decide to show up until August, and now the heat is here with a vengeance, which of course means severe weather of the electrical variety. Outside my office window a bolt of lightning slammed into the ground just a few moments ago.

storm2

(To protect the super secret location of our low profile facility, the arrow above is not actually pointed to our exact location ;)   ).  A mere 35 feet to to my left is 15,000sf of raised floor with customer production infrastructure humming along quietly.  Am I worried about an outage?  Not in the least. Why? Is it because we have a lightning suppression system deployed or because we have 3Mw of generator backup?  Is it because we have paralleled UPS units and expensive battery strings?  It is because we have at least one spare of every key component waiting in standby mode?   None of the above. I’m not concerned because we TEST all this stuff, and we do so religiously.

gentest1

Our generators are typically run weekly, and tested under full load regularly. Our switch gear is regularly maintained and tested by professionals.  Do you trust your switch gear enough to pull the plug on the whole building?  We do, and we test that too.

If you’re running infrastructure that you consider critical to your business, how comfortable are you that you’ll stay online if  lightning takes out the utility pole across the street?  Is your equipment protected  from the electrical surges that summer storms bring? Really protected?  If you’re using server-room class UPS units, such as the APC SmartUPS rack or free standing products, have you checked the health of those batteries lately?  How about the load on those units?  Has server sprawl quietly overloaded them? Will you stay up long enough to shut down gracefully or transfer to generator?  If you’re not fortunate enough to have a generator, how long can you afford to wait for power to come back on?

Lots of obvious questions to be sure.  But we find that in small to medium sized business,  they are often tragically ignored or neglected.  We know because these businesses in our area order their UPS units and parts from us, often after the worst has happened. When you discover that 92% of the businesses in the US are in fact 1-99 employees in size, neglected power infrastructure is a more widespread problem than you might imagine.

If you have a really big shop, it’s likely you have all of your power gear under maintenance and test it regularly just like we do, but if you’re a smaller shop, it would be good practice to begin regularly checking the health of your backup power gear and testing it. Today would not be too soon to start.

We care about all this because it’s par for the course if you’re running an enterprise-class data center as we do, but quite frankly it’s a pain in the butt if that’s not your business. If you’d rather focus on your core business and not worry about lightning storms, you could instead put your gear here. Outsource your power worries to folks who take them seriously and are well equipped to handle them.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

A Pragmatic View of Downtime Cost

Friday, July 17th, 2009

On occasion, our prospective customers will mention they’ve done an extensive study to determine how much a minute of downtime costs their company. Ergo they are visiting with us to establish either a primary or secondary location as part of strategy to lose exactly $0. Sometimes I wonder what Gary Coleman would say if they heard their explanations.

Well said Gary. Terse, but adequate.

There is apparently some very deep magic involved in figuring out the cost of downtime, and no one seems to agree exactly on what the proper incantation should be. A little over a year ago on The Numbers Guy, there was a humorous post of just how ambiguous, and ultimately irrelevant, calculating this number can become.  To wit:

One blog headlined a post, “Amazon’s $3.6 Million Outage?,” noting that if projected second-quarter revenue was spread evenly over time, then the site normally would be making $1.8 million per hour. TechSpot.com and the Seattle Post-Intelligencer performed similar calculations with last year’s revenue to estimate that Amazon lost $29,000 per minute; CNET used last quarter’s results to calculate $31,000 per minute. Then the New York Times, last week, reported that “Amazon, by some estimates, lost more than a million dollars an hour in sales.”

Does it really matter who was right?  It’s A Lot Of Money by anyone’s reckoning, that is, if you can believe the numbers in the first place.

What exactly is the value of doing an extensive study, or even a moderately detailed investigation, given the cost the of meetings hours one would burn doing the analysis vs. the quality of data one could actually expect to produce? Often the variables can become so complex that gut feel and opinion invariably creep into the equation just to get the math done. This in turn results in a baked-in degree of subjectivity that ends up being the source of debate when the numbers are used to justify a business case later on.

Maybe I’m missing something, but there has to be a better way. Usually the only reason we want to know the cost of downtime is to justify the costs necessary to keep the key parts of the infrastructure highly available. It then logically follows that we need to know which parts of our infrastructure really contribute to the top line such that they are truly worthy of being made highly available. With that in mind, let’s ask the questions:

Let’s say gross revenue was $20M last year, and we do business 5 days a week, or 260 days/year. On a simple even spread, that’s $53/minute, or if you like $3,205/hour. Can you make a business decision based on that? No matter how you rig the math (e.g. more heavily weight end of month, etc), it boils down to crazy numbers that look this. How can they possibly help you justify a monthly spend on an availability solution? Does it therefore matter precisely how accurate they are? The above calculation is admittedly simple and perhaps even lame, but I would argue than any other more exotic formula does no better.

I think the more useful exercise to take the time to really understand what your key business systems are (not the discrete elements like servers and routers), and then determine what underlying systems are required to make them go, along with all of the various inter-system dependencies. As obvious as that sounds, our experience has been that folks often do not have this kind of a handle on their infrastructure. Instead of saying to senior management that “our SQL server absolutely has to be up all the time because the business depends on it,” you should be able to say “our SQL server needs to be up because we can’t take orders if it’s down,” or “our SQL server needs to be highly available because we can’t load our trucks when it’s down.” You are much more likely to hear “make it so” on your DR proposal with this approach than if you go in with a story about how much downtime costs the company. Senior management is well aware of the revenue numbers – they don’t need to be reminded, and trying to foist a murky cost of downtime justification on them is an iffy, if not perilous strategy. What they want and need is plain talk on what happens if things break.

Speaking in pragmatic terms the business understands will result in funding for a DR plan that makes sense for the business, though it may not be everything you’d personally like to have. So if you feel you really need synchronous replication rather than asynchronous, you’re going to have to explain in business terms why the extreme extra cost is necessary. Pitch the solution in plain terms. Lay out information meaningful to your leaders and trust them to make a good business decision.

But do influence them to go with a data center that has your operating budget in mind. We can often bring some of the seemingly out-of-reach high availability solutions within your reach.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

The Truth About Up Time

Friday, July 3rd, 2009

On June 29th a cloud burst occurred at Rackspace, proving that even the mighty eventually do fall. The blow-by-blow Rackspace Twitter account of their power outage provides interesting insight into what happens during a crisis at a hosting provider.

42-15823054

In every industry there are dirty little secrets that customers either don’t know about, or don’t want to know about. The meat counter at the grocery store is a prime example. Those steaks and chops look really good, but did you every watch the entire process from hoof to hamburger? It’s not pretty, and for most folks it’s Too Much Information.

So here’s Dirty Little Secret #1 of the hosting industry:  While most every hosting company has to make the claim in order to be credible, no one can deliver 100%  data center up time forever. No one. Not even the market leader.  So why then make the claim at all?  Because that’s what customers demand to hear. In talking with customers we find a widespread cross-industry sentiment, usually absent of any logical rational,  that says “my business is so important that my infrastructure has to be running 24/7 without any interruptions at all.” Unless your business is keeping patients alive with sophisticated medical equipment,  this seems like a rather difficult position to defend.  But no one wants to be the bad guy to point that out.  We know there is life beyond brief outages because they happen every day and yet nobody goes broke, but it is typically unwise to say so.

Realizing that downtime will occur, even in the elite shops of the world like Rackspace with their fleet of nine data centers,  you do need to make realistic decisions about what level of  up time you really need in light of the type of business you’re in.  And while it may sound like heresy, you also want to make decisions about things that are much more important than up time levels. It seems to me that if downtime is inevitable, and we know that it is, then I want my equipment in the hands of people who know how to recover quickly from an outage, who will communicate with me regularly and truthfully throughout the crisis, and who will do their level best to get me back on line as quickly as possible.  I want my equipment in the hands of highly competent people that I can trust. You can’t make that determination when you sign up for service via a web browser or where you do the whole transaction over the phone. The only way to make the determination is to actually meet the people who are going to become the custodians of your infrastructure.

Before you put your equipment in the hands of someone else, make the effort to visit them.  If they don’t allow visits, that should be a big Red Flag #1. Talk to their operations and support people, particularly the folks who will be touching your equipment. If you’re not allowed to talk them, that should be Red Flag #2. Ask them about their up time guarantee.  If they look at you square in the eye and say 100%, that should be Red Flag #3. Kick the dust out of your shoes and move on.

Let me cordially invite you to visit our data center hosting facility this summer.  No red flags – just trustworthy, highly competent, and dependable people.

//spk

p.s. Happy 4th of July!

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.