Archive for the ‘Hosting’ Category



Jilted Again

Friday, May 22nd, 2009

On my way out the office earlier this week, I met our master Jedi of monitoring standing in my office door.  “You might want to sit down”  he said.   In over 10 years of working together in the hellfire and brimstone of systems management, he’d never said that before.  “What could possibly be that bad?”  I wondered.  “I just went to the Cittio support site,” he said calmly as he  handed me his Blackberry, “Here’s what I got:”

cittio-done

For those of you unfamiliar with the world of network management systems, the name Cittio  means nothing.   For those of you unfamiliar with the history of  systems management tools at DSS, you’re also likely thinking “Dude, get over it.  It’s just another company folding.”  Or, as a former MVS systems programmer colleague use to say to me, “Get over it…and like it.”

Four Times Bitten, Forever Shy?

IBM Netview. We’ve been managing customer systems with NMS tools since 1995.   Being an IBM business partner, we decided to start with IBM Netview, a close but homely cousin of HP Openview.  While Netview was not without it’s charm, it was a cruel task master. We spent more time offering animal sacrifices to the tool to keep it running than we spent actually using it.  Besides taking 45 minutes to begin polling after a restart, the monitoring daemon would just go off into the weeds and stop polling.   We never could really trust it, and reporting left much to be desired.  As we continued to struggle with Netview, IBM bought Tivoli and the product was moved over to the Tivoli side of the house for assimilation into the Tivoli Enterprise Framework.  Since IBM surely wouldn’t have bought a company with bad products, and since business partners now had easy access to the Tivoli products, we naively decided to take a look at Tivoli Enterprise.

Tivoli Enterprise Distributed Monitoring (DM). After spending considerable time and money getting indoctrinated in the Tivoli Enterprise Framework and DM, we quickly realized the product was even more of a monster than Netview.  More animal sacrifices and offerings of time and energy were required for less functionality and horrible reliability.  We did one customer implementation and stopped.   We had seen and suffered enough.  While contemplating whether to shave our heads and put on sackcloth and ashes, we heard of a new NMS savior coming for the small-medium business space.

Tivoli IT Director. Enter codename “Bossman.”  By divine intervention, our company was selected by Tivoli to become part of small circle of customers and partners involved in a skunk works project to develop an NMS targeted at small shops.  An all-in-one tool that could poll for availability, collect performance data, monitor thresholds, collect HW/SW inventory, and even do software distribution. A veritable Ginsu knife set for systems management (without the 50-year guarantee).  But wait, there’s more…  Tivoli released the product on time, and as insiders we were way ahead of the game.    We began implementing it  at customer sites with good results and the sun was beginning to finally shine again.  No more animal sacrifices. We had finally begun to rebuild our remote monitoring business out of the ashes of the Netview days.

IT Director did have one flaw in it’s armor – it couldn’t support more than a couple of hundred nodes. But the boys in Texas were on top of that, and project “California” was underway to take the number of nodes up to 5,000.   Just days before we were to receive the beta code, Tivoli pulled the plug on the product.   Our sources behind the curtain told us why: it was felt that California, at its dramatically lower price point, would compete against Tivoli Enterprise Distributed Monitoring, and the Mercedes Benz crowd at Tivoli were having none of that.  The product was pulled from the portfolio and given to the IBM PC division in Boca Raton, where it was thoroughly lobotomized and re-released as IBM Netfinity Director.   So began the Dark Times.

Time out. I realize this is a blog post, not the Chronicles of Narnia, so I’ll hasten to the point.   Director was completely unusable after IBM Boca got done with it, and we had to move on.  At this point, having been left at the altar by Tivoli, we decided to develop our own system, DSS Systems Manager, and over the next two years we did exactly that and had very satisfying results.  Customers loved DSM, and so did we, but we had one problem – DSS was, and still is not a software development shop.   At the time we felt we couldn’t continue to develop the product and properly focus on our core business.   As we moved into the data center hosting business, we realized we needed additional functionality that we felt we could no longer afford to develop ourselves.   So we sought yet another commercial answer.  Back to the story….

tang

Cittio Watchtower. Watchtower essentially represented where we wanted to take DSM had we decided to continue development.  We negotiated a deal, installed it in under 30 days and were up and running.  Like good old Tang, we just added water and the rest was history.   We cultivated a close relationship the CEO of Cittio and had regular contact with the VP of Development and other high level folks who controlled the product’s destiny.  We did joint marketing events with them, including speaking on their behalf on webinars, and served as a reference account when they had large deals on the table.

The Betrayal

Only a week before the company dissolved (like Tang perhaps), the CEO personally asked me to serve as a reference to a couple of companies that Cittio was considering for OEM relationships.  Context is everything, and little did I know that OEM had been secretly redefined to mean “Our Exit Money.” In a little over a week after I had happily given a glowing Watchtower review to a company named Nimsoft, my chief monitoring engineer was handing me his Blackberry with news of Cittio’s demise. We contacted Nimsoft on The Day After, and the basic message we got “good luck fellas, you’re pretty much on your own.  The product will be no more. We can’t promise support of any kind.”   Simply fabulous.  To be fair, the whole situation is still in flux, and my sense during the phone call was that they hadn’t fully considered the fallout from their actions.   They may very well come back with a migration plan or limited temporary support, etc., but for now we Watchtower users are out in the cold.   Our new bride has packed her bags and left us with the credit card bills.

Getting Over It But Not Liking It

Thankfully, faith in God allows me to maintain my composure in situations like this, but a wise friend once taught me that buried feelings are buried alive, and when they come back, they come back as either anger or depression. So in the interests of good mental health, I’m compelled  to express my feelings about this debacle and get back to business.  Play this back to back 4-5 times for proper effect:

Nobody understands being jilted quite like Sam Kinison.  I feel much better now.

What Does This Mean to You?

So what’s the take-away from this situation that we can apply in our shops?   DSS has been on both sides of the build vs. buy decision, and there are clear advantages and risks to both positions.   My opinion, while still standing here in the smoking crater, is pretty much what it’s always been:  if you have the talent and can afford the time, building your own critical monitoring systems  is still your best destiny.  You have control of all of the variables and are forever immune to vendor adultery.  There is plenty of good open source material out there to take care of  the heavy lifting and serve as a good starting point.

If you don’t have the time or talent, then buying is obviously the only option.  Cittio was a VC-funded company and therefore subject to the whims and wiles of the angels and VCs. If I were to buy again, my first rule #1 would be to limit the vendor short list to firms beyond at least the magical fourth round of funding.  Translation: No fresh start-ups. Rule #2 would be to pick a product that is already firmly entrenched in a lot of Really Big Companies with big legal departments.  There is safety in numbers and large legal teams.  This may yet turn out to be the case with the Cittio breakup – they had some Really Big Customers, so we’ll wait and see if any major players file for damages in divorce court.

Unless IT is your core business, your best strategy is simple avoidance.  Running your own infrastructure is full of headaches and horror stories that doing nothing but hurt your bottom line.   Let someone else highly skilled in being jilted deal with all the risks, headaches, and heartaches.

//spk

Postscript: Just as I was getting ready to publish this entry, I received a call from a former senior exec at Cittio.  Though no longer on the payroll, he apologized at length for the situation, described what went down, and was genuinely troubled at the way the in which former customers are now being treated.  In the end analysis, the VC guys pulled the plug on a healthy company.  While my contact really didn’t know why it happened, perhaps they were selling healthy assets to compensate for unhealthy ones.   Who knows.   In any event, it’s time to move on.

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Hog Wild

Thursday, April 30th, 2009

Try as you might, your IT department won’t be allowed to ignore the current drama surrounding the swine flu outbreak south of the border. While the number of confirmed swine flu deaths is one (yes one) as of this writing, the 7/24 news cycle is in full Doom’s Day mode. Your customers may soon be asking what your plans are because they are just in the process of making their own plans. Unlike “normal” data center disasters like fire or flood, a pandemic scenario is just not on most people’s planning radar.

So what are we in IT do? Chances are you’ve already taken care of it. If you have remote access technology in place for your employees, and you’ve already planned for a building disaster, you’ve probably done as much as you can do unless you can find staff who are impervious to the flu.

Commander Data

The rest is really a matter of business continuity, not disaster recovery.

A relevant article appeared on processor.com a few years ago that stated as much:

A major part of an IT admin’s job during a pandemic will involve remote IT administration. Unlike disaster planning for acts of God, such as floods, fire, or earthquakes, staffers during a pandemic will not immediately seek to relocate.

“One interesting difference between [a pandemic] and another disaster is how everybody cannot just go and work at a different data center. You don’t want to take everybody and put them all in one place,” notes James Governor, an analyst for Redmonk, an analyst firm built on open source. “You do need a distributed and potentially home-working strategy because this is not the same as your [average disaster].”

Enabling staffers to access and perform networking tasks remotely is crucial in the event of a pandemic. “Any establishment worth its salt has good access tools to use the network from wherever they are on the planet. That is just good practice in any case,” Governor says. “And certainly, it is good practice if one is concerned about any potential issues where you might not be able to access the network in a way that you normally would.”

And as Bob DeCoufle pointed out on Tuesday, there is only a remote possibility of needing to invoke your disaster plan, assuming you had a recovery facility “outside of the epidemic region.” How one would anticipate where that would be is another matter, but in any case, few of us have the resources to relocate around a pandemic.

Unless we’re hosting hospital applications or other life support systems, asking our employees to do more than work remotely is probably unrealistic. In a genuine crisis, they will likely be home with their families, and Uncle Sam will probably be calling the shots regardless of our plans.

If by chance you are also required to cover the continuity aspect of your company, Forrester Research offers the following planning tips for a pandemic:

Preparing for a pandemic involves collaboration between all the departments in an enterprise, Forrester Research says. If an outbreak of a contagious virus or disease keeps more than half of all employees from showing up for work, some of the things an organization must do include:

Maintaining inventory and supplier relationships

Providing systematic communications about the outbreak for employees

Making vaccines and medical support for employees available (if possible)

Offering means of transportation to and from work in case public transit systems fail

Providing tools and resources to enable employees to work from home

The phrase “this too shall pass” brings me peace of mind. The swine flu will pass. In the meantime here at DSS, we’ll be making sure our remote access systems are up to snuff and reviewing our staffing plans for the data center. An emergency IT staffing plan should reflect the kind of business you’re in. If your IT systems support the lives of others, you obviously have a greater ethical responsibility than those who are running online shopping sites. For the crisis du jour, you will want to have an appropriate plan for on-site data center support.

And if you put your gear in a facility like this, you’ll have even less to worry about the next time the flu bug oinks in our direction.

That's all Folks!

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Who’s Afraid Of The Big Bad Wolf?

Monday, April 20th, 2009

wolf

News  of Cisco’s intent to enter the server market with its Unified Computing System offering has set the industry pundit’s hair ablaze.   “How will IBM & HP respond?”, “How much market share will be lost to Cisco?”, “Do you want a plumber building your servers?” and on it goes.  The FUD truly has been flying.  You would think the Big Bad Wolf had just come back to Grandma’s house.

So, what does the announcement of UCS mean to us here in the non-rarified air of business computing?  Will it help us run our shops better?

Listen to Cisco CEO Chambers closely…

We look at this as bringing virtualization to life…unleashing the power of virtualization.   We go about it catching market transitions and trying to set timing, first in the data center, but make no mistake about it [UCS will make it] all the way in the home… [emphasis added]

 

What market transitions, pray tell, is he referring to?  Could it be anything other than the transition to utility based computing? It’s fairly clear he’s not talking about our server rooms and data centers.  No, it would seem Cisco has its sights on something much larger. Chamber’s message is unmistakeable.  If the coming world of utility-based computing were to be compared to The Matrix, Cisco would not be found content with simply supplying the network plumbing – they want to be the Matrix itself. Having already tucked away the network, we now see a move into processors. Can storage be far behind? Perhaps the Big Bad Wolf already has that in the oven.

It doesn’t seem on the surface that UCS is intended for the typical IT shop, but let’s assume otherwise for a moment.  Is there a compelling reason for us to consider (or fear) UCS?    What would make us willing to try a  brand new brand?

In many ways, owning server hardware is a lot like owning a vehicle. First, you make your purchase based on size, looks, performance, the features you need, reliability, serviceability, and of course the price. Sometimes you’re looking to save gas (power), but not always. Maybe you decide to lease it. If you end up with a lemon, you know that very early in the game, and you get the vehicle fixed or replaced under warranty. From that point on, if you put in decent gasoline (clean UPS power), do regular maintenance (clean the fan grids, do disk defrags), and operate it within its design limits (proper cooling), it will run well for a long time.   When it wears out, or after you simply get tired of it and want something new and sexy, you buy a new one, sell or trade the old one, or possibly keep it and run it until the wheels fall off.

In the final analysis, whether you buy Chevy, Ford, Chrysler, or a brand you’ve never tried before really doesn’t matter. You go through the same decision process and ultimately you buy what you like or what you feel comfortable with.  The care, maintenance, and disposal process is the same no matter what you buy. And statistically, the reliability is pretty much the same across the board, despite the religious fervor that surrounds each brand. They all run well on balance, and they all have an occasional breakdown. For every hardware horror story out there, there are scores of identical hardware instances that run their entire lifetimes without a glitch.

Of course, if you absolutely must be the first kid on the block with a new hardware vendor, your mileage may vary.

Early UCS adopters on the phone with Cisco Tech Support

Early UCS adopters on the phone with Cisco Tech Support

For most of us, UCS is not going to help with the primary purpose of our infrastructure.  So what does make a difference in how well our business systems stay up and running?

If you put a good driver (software) behind the wheel of your vehicle, you can be confident it will stay on the road doing what you intend it to do.  If you put an unskilled, abusive or reckless driver behind the wheel, you can expect more mechanical breakdowns (minor outages), accidents (major outages), or worse (disaster declaration).

I resisted naming operating system names above, but ask yourself, when was the last time you had down time because an operating system or application went off into the weeds?   Do you schedule weekly or nightly reboots “just for good measure” because you can’t trust things to stay healthy?    It is an alarmingly common practice in our client base.

There’s a Red Hat 7.2 system that’s been hosting workload here for years that only comes down when we take it down to replace or upgrade the hardware.   We have a farm of VMWare ESX servers that behave just as well.   Yet we also have a number of Win32 servers running on the same hardware for which I can’t say the same. 

It’s not the hardware.

Lemon’s notwithstanding, the brand of hardware, be it IBM, HP, Dell, and now ostensibly Cisco, really is not the key factor in maintaining uptime.   In this day of clusters-everywhere and RAID-everything, it’s typically not the hardware that takes you down – it’s unreliable software, change  or human error.

As for UCS, it doesn’t look like the Big Bad Wolf is coming to our house anytime soon, but it is a good idea to keep a watchful eye on where he is going.  Cisco has cold hard cash and a big vision, but that vision seems cast for The Matrix, not our server rooms.

theciscomatrix

Buy what you’re comfortable with and put the right driver behind the wheel, or better yet, let us worry about that for you.

 

 

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.