A New Fiance

January 20th, 2010, by Scott Kantner

They say love is blind, and in the case of systems management tools it seems especially true.  Despite multiple bad previous experiences, we’re going back for more.  In recent days we’ve been implementing Zyrion Traverse for our hosting operations.

If you Google “systems management” or “systems monitoring” you’ll quickly learn that these tools are now essentially commodity software.  There are dozens, if not several hundred reinvented wheels out there to choose from, whether they be free, ad-supported, or pay-for products.  In fact, the space is so full of purple cows at this point that one has trouble judging between the shades.  If you’re thinking of doing a start-up to produce Yet Another Systems Management Tool, differentiating yourself will be the primary challenge.  Writing the software will be only 10% of your effort.

So, with all the eligible prospects out there, how does one select a systems management tool?  All of the worthwhile products have the same set of basic features: discovery & polling, thresholding and alerting, and data collection and reporting. Many of them also sport attractive eye candy in the form of topographical status displays (the practical usefulness of which is often questionable and a subject for another day) and three dimensional displays of node relationships and performance data. Some can do performance base-lining from which they are able to tell when a system begins to deviate from it’s normal performance profile.  More ambitious products attempt to correlate the blizzard of event traffic blowing through the network and do root-cause analysis.  The best ones are highly configurable, extensible, and offer great API’s, because no matter what you product you select, you will invariably be faced with some amount of customization to make it really fit your environment.

We narrowed our search to products that reflected the ideas  guiding the design of our own in-house tool until we will ceased its development several years ago. We were headed to the next level of system managements: provide business views of our operations.  Most tools focus strictly on the status of discrete elements and their resources, e.g. a router is down, a server is out of disk space, etc.  While that’s certainly useful information to a degree, the real question is, what does it mean to my business?   If disk space is low on a test server, how much do I care?  Should a red light on that condition cause a severity 1 ticket to be opened?  Is the downed router affecting my ability to ship product or simply the archiving of my 2009 backups?

We looked for a tool that could take the various elements that make up our major business systems and allow them to be defined under a small set of  dashboard lights, one for each business system, that would help us understand the impact of the various events occurring across the enterprise.  So, for example, if the “Order Entry” light goes red, we know it’s an all-hands-on-deck situation because of the well-known equation: no orders=no revenue.  If on the other hand, the “R&D” light goes red where all of test systems are located, there is no need to call in the cavalry.

Because human resources are not easily cloned in a crisis, the relative business value of a red light is extremely important to know when multiple red lights are on at the same time. Which ones should get the immediate attention of our limited human resources, and in what order?   A business view answers those questions, especially if you happen to know the dollar cost of downtime for each of your business systems.

Being in the hosting business, we also needed a tool that could segregate our customers into their own groups and provide a similar business view capability. This feature, usually referred to as “multi-tenancy,” typically comes with the ability to give each customer a personalized view into the business systems hosted in our facility.

After several months of shaking the trees and examining their fruits, we settled on Traverse. It does the basics well, has multi-tenancy, and supports business views. The company appears to be stable. The team that came on-site to help with implementation was capable, and the price didn’t break the bank. While I’m not endorsing Traverse as the be-all-end-all nirvana gotta-have-it systems management tool, it does meet our requirements, and it’s definitely an option worthy of your evaluation if you happen to be in the market for a new system.

With all this goodness, it would seem like our new fiancé is a keeper, and yet the ghost of Systems Management Tools Past still lingers.  Hopefully this relationship will last. Hmm…maybe we should sign a prenup…

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Leave a Reply


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.