Posts Tagged ‘disaster planning’



Personnel DR

Friday, July 10th, 2009

Are you prepared for the departure of a key technical resource in your operation?  Someone who holds the infamous “keys to the kingdom?”   Typically there is a least one person on a company’s IT staff who achieves deity status in regards to physical and logical access. Sometimes key skills also reside in just that one person. If such a person leaves, either voluntarily or involuntarily, how would your critical operations fare?

vista_help_icon_by_thoosje

Now would be a good time to take a fresh look at both your internal documentation and your skills matrix. Things to consider:

  1. Are all sysadmin userids and passwords documented somewhere, somehow?
  2. Are all critical architectures documented in excruciating detail?  (SAN, virtualization, LAN/WAN, disk replication, backup/restore systems). You want to see how things are connected and how they are intended to interact. You want to see things like IP addresses, subnet addressing schemes, WWN numbers, hard and soft zoning information and the like. You’ll know you have all of the information you need when you can hand it to a new engineer and he doesn’t have any questions. Seem impossible? Strive for it, and the result will be good enough.
  3. Where does the above documentation live if you do already have it?  Hopefully it’s not on your staff’s laptops.  If you think you already have it in a shared on-line space, are you sure you have all of it?  And is it being backed up?
  4. Do you have runbooks for all of your servers?  Are they current?  Where are they? Are they backed up?
  5. How many people have practical working knowledge in each area of your critical infrastructure?  Do you have more than one VMWare tech?  More than one SAN person?  How about Active Directory or Exchange? Ideally you’ll want three in each area. Contract for it if you need to.

 
I could go on, but I think you’re getting my point. This process is somewhat like writing a will.  It’s a real drag to write up, and everybody knows that they need to take care it, but yet it often gets ignored until it’s too late. And just like a will, all of this documentation needs to be updated on a regular basis or it may end up being worthless at crisis time.

Alternatively, you could move the responsibility for a large portion of this to a professional hosting facility.   Why not limit your exposure to just your applications and let us worry about how all the  plumbing is hooked up?

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Worse Than Failure

Friday, June 12th, 2009

Whether you have your own shop or host your gear somewhere else, this week’s horror story at VAserv should serve as a wake-up call if you’re responsible for safeguarding vital company data, especially your customer’s data.

To briefly sum up the story, hackers took out 100,000, (yes 100,000) web sites, many of them permanently, in an evening’s worth of work.   Just restore the backup, you say?  Not so fast.

VAserv basically offers low-cost Web hosting services using virtualized private servers based on HyperVM. As of Wednesday morning, it was not clear how many of its customers — many of them based in the U.S — had irretrievably lost data in the attack. That number could be high, though, because half of those affected had apparently signed up for an unmanaged service that doesn’t include backups, according to the Register. [emphasis added]

 
And for those customers that did sign up for backup?

A note on VAserv’s Web site, which is now just a text document with details on the company’s restoration efforts, claimed its staff had been working “tirelessly” over the last 48 hours. “However, we have now reached the end of all of our servers, and as such, if your server is not currently up, or not partly up, then it is unfortunate that you will have lost your data due to this third-party attack,” the note said.

Oh the humanity, indeed. ComputerWorld’s Jaikumar Vijayan receives this week’s Master of the Understatement award:

The continuing fallout from a hacking incident at U.K.-based Web hosting company VAserv should serve as a powerful reminder that companies need proper data backup and disaster recovery procedures. The incident, which could result in a fire sale of VAserv to another hosting provider, is also an especially stark example of the kind of havoc that a malicious attacker can wreak on businesses.  [more emphasis added]

 
Can you say ‘class action lawsuit?’

Attempts to reach Rus Foster, VAserve’s director via e-mail and phone were unsuccessful. But the terse updates on the company’s Web site and the thousands of customer posts on a discussion forum painted a picture of total chaos.

I’ve personally reached the end of my physical and emotional tether” Foster wrote in one post on the discussion forum late Tuesday evening. “We have worked pretty much continuously for the last few days firefighting.”

Foster wrote in a post that suggested he was putting the company up for sale. In his note, Foster said he had two options: Do what’s best for the customer base by getting “some big boys in behind” to help get things back up and running. The other he said was to simply “Run away and hide and just say to everyone “good bye”"

 
Run away and hide?  When did that become a viable option for gross negligence?  No one can outrun the long arm of the Bar Association.

matrix42

I’m reminded of a line from The Architect’s classic speech: “There are levels of survival we are prepared to accept.”   There are clearly plenty of folks that seem comfortable managing their IT shops that way.  We see it all the time when we look at their backup strategies and disaster plans, if they have any.  It seems to me that being totally wiped out or having to sell our companies because of something so easily preventable as failed backups is not one of those acceptable levels. But wait, it was those scoundrels the hackers, wasn’t it?  They caused the problem, and they killed the company.  No they didn’t. To be sure, the hackers wreaked havoc, but what they really did was expose the ultimate game-ending event: no backups. Had proper backup procedures been in place and restores regularly tested, the incident would have been merely one of  downtime and possibly SLA penalties. (Yes. I know credit card data was also stolen, but that’s not necessarily a game-ender.)

Being an infrastructure company, we routinely preach about the need for proper backup and restore procedures, and the need to test them.  Sadly, it often falls on deaf ears, and while we do occasionally read an obituary like VAserv’s,  death-by-no-backups is happening all the time in companies you’ll never hear about.

There’s another quote I like from the Matrix: “You hear that Mr. Anderson?  That is the sound of inevitability…it is the sound of your death…”   If you aren’t testing your ability to restore your backups (you do have them don’t you?) , the sound of inevitability may be tolling for you.

Hope is not a strategy.  If backups and restores stress you out, or you’re just hoping they’ll be there when you need them, consider handing it all over to a group of people who live and breathe it. They actually enjoy backups, and they’ll take good care of your gear too.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.