Archive for the ‘Human Factors’ Category



Great Expectations

Friday, July 9th, 2010

What could possibly be more fun than standing outside in the 100+ degree heat here in southeastern PA?  Standing outside in 90+ degree heat while waiting in line at Disney World.  At least there’s the promise of something fun, and possibly something cool and wet at the end of the wait.

While recently wading through the sea of humanity and waiting in some of the infernal lines that define Disney at this time of year, I was struck by an interesting IT analogy early in the week (yes, I really did need a vacation, and by the end of the week I wasn’t thinking about IT at all).

In last June and early July, the number of baby-strollers per square foot in the Magic Kingdom increases to approximately 10x the normal rate.  This forces one to put up with gives one many opportunities to observe other people’s children under extreme conditions. It’s amazing to watch parents expect their 5-year olds to behave like perfect angels in subtropical queue lines for upwards of 45 minutes, or sprint from one end of a park to another on tiny, tired little legs to score a Toy Story Fast-Pass before they’re all gone. It’s amazing because you can tell these kids are perfect hellions under ideal conditions as well. Putting them under stress only intensifies the problems that already exist.

Likewise, if you’ve got poorly designed or neglected infrastructure, simply moving it to a colo facility isn’t going to improve up-time or performance significantly, if at all. Certainly you can improve environmentals, save capex, and get lower network latency with a colo move, but if application response time and reliability are sucking wind before the move because of bad design or sysadmin neglect, not much is going to change.

My point isn’t that you should avoid putting your infrastructure in a better home if you need to, but that you shouldn’t expect it to behave any differently just because you moved it. Moreover, move time is not the time to make drastic changes to your production systems. It’s not a “free” outage window.  The more changes you make during a move, the higher the risk of a failed, or at minimum a very stressful move.

On the other hand, a move can be an ideal time to upgrade to better hardware and legitimately raise your expectations. For example, you can set up new hardware next to your old, cluster it, and then move the new half of the cluster to a better home while the old half continues to run the business. After you complete the move and let the clusters resynchronize, you can turn down the old cluster and all activity will automatically switch over to the new hardware. Your users will never feel a thing. Very little pain, but very much gain.

Of course that all sounds good, and there are a lot of details involved in making it happen, but that’s what we do best. If you’re interested  in smoothly moving your critical IT gear to a new home and need some experienced help to get it done, give us call. Hardware prone to temper tantrums is one of our specialties.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Keep The Change

Wednesday, June 23rd, 2010

Does this sound like your IT shop?  Reports from the Uptime Institute consistently show that the majority of reliability and uptime woes aren’t caused by hardware,  facilities, or utility failure – they’re caused by humans, and what pray tell are those humans doing?  They’re changing things, and often too much of the change isn’t planned, approved, or documented.  Or, there is simply too much change going on at one time.

Much like a bomb is meant to explode, technicians are meant to be technical, so it’s a bit unrealistic to assume they’re giving a lot of thought to managing change, much less be fond of doing so. They just want to git ‘er done, and in large part, we pay them well to not only do that, but to do it right the first time.  Hard core techies, the ones that really know how to make things work, typically aren’t also wired for sitting in management meetings. The problem with managing change is that it’s boring. It’s not technical. And explaining highly technical things to non-technical folks in a change management meeting is not always the average techie’s strong suite, nor perhaps the best use of their time. To the contrary, it can be a very frustrating experience for them, which can lead them down the Dark Side of making changes beneath the radar. Effective change management therefore becomes a bit of a balancing act. We need to know what’s going on, but we don’t want to bog everyone down in the process.

In our data center controlling change is not optional. Reliability demands it, as do the Spanish Inquisition SAS 70 auditors. But we’ve found a way to manage it without terribly burdening our technical staff. Change requests may be formally entered in the system by any authorized individual whether or not they are technical;  they are simply the person requesting the change. The request is then routed to a technician who can assess what needs to be done, adds those details to the request, makes a suggestion as to when it might be done, and then it’s passed on to someone in management who can assess the risk and approve/disapprove it. If a change is of major significance, the request comes before a Change Advisory Board (CAB) for final approval. Technicians, while welcome, are not required to attend CAB meetings.  When requests are properly documented, the CAB is almost always able to make a good decision without further involving the technical staff.  When the CAB does need more information or defers a  request for some reason (e.g. too many changes on one night), the technician in question is notified and it’s handled outside of a meeting.  This saves time, money, and mental fatigue. Since the pain threshold is relatively low, this method also encourages all change activity to actually be run through the proper channels.

Our process is capable of handling very high rates of change, but that doesn’t mean that we do so.  On the contrary, we try to minimize the rate of change, batching things together when it makes sense to  minimize outages, and spreading them out when the risk is high to maximize uptime.

Managing change is not fun, and you may be justifiably weary of it.  Let us take that burden off of your shoulders.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Twenty Infrastructure Days Til Christmas

Friday, December 4th, 2009

lionel1

At this time of year my focus changes from IT to LIONEL infrastructure.  It’s time to begin building this year’s Christmas layout!   Ah, the joy of no project plan or business case!   And looking at the calendar, Christmas day is quite visible on the horizon, so I need to get moving.

With Christmas being on a Friday this year, the early days of that week will likely not be the most productive in the IT shop as the staff winds down the year with inter-office daytime soireés , extended “lunch hours” for last-minute shopping, vacation time, and so on.  From a online availability standpoint, this is clearly not the time to be making any major changes to your critical systems.

Here are a few ideas for redeeming the time during the slow days before Christmas Eve – those days when few creatures are stirring in your shop:

  1. Check your software patch levels.  Are you up to date on critical systems?  If not, make plans to get current after the holidays.
  2. Check you hardware maintenance agreements.  Is all your gear covered? How about the hardware that’s going out of warranty Real Soon Now? Are all you agreements current?
  3. Check your software licenses.  Are you using more than you own?  Funny how that creeps up without notice.  You may want to square up with your vendors at year-end fire sale prices rather than wait until January.
  4. Have your sysadmin’s check the free disk space across your server farm.  Is it time to order more storage, or simply clean out the dead wood?  If a file hasn’t been referenced in the last 12 months, archive it or ask the file owner if you may simply delete it.
  5. Check for unnecessary VM sprawl.   Do you have virtual servers that you can decommission?
  6. Review your backup strategy.   Are all of your critical systems included properly?
  7. Test your recovery capability.  Try to recovery a file, a database, and perhaps even an entire server from backup.
  8. Declare email bankruptcy and ask your users to do the same.  Don’t start 2010 with 2+GB’s of personal email.  Refuse to be part of the highest form of pack rattery and digital waste known to man.
  9. Review your Internet bandwidth usage.  Do you need more or can you do with less?  Do you need to have a chat with any abusive power surfers?
  10. Review your private bandwidth usage and contracts.  Are you nearing the end of any contracts?  Is it time to start shopping for better rates?

 

This is not a list of really exciting stuff to be sure, but they are all important, low-risk things you can do in the inevitable pre-Christmas lull to get your shop off to a good start in 2010.

Christmas Movie Review Department

There are many renditions of Dicken’s “A Christmas Carol” to chose from.   Arguably two of the best star Alistar Sim and George C. Scott, respectively. I personally prefer the George version , but you can’t go wrong with either.  You need at least one of these in your collection.

51JTeZ97zPL._SL160_ 51EZ902Z46L._SL160_

Christmas Train Department

Also, if you don’t have a train under your Christmas tree and you have kids, grand kids, or you just know kids in your neighborhood, you really ought to head to a hobby shop and check out the Lionel starter sets.   This one will look especially fine under your tree:

lionelpe

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.