Wednesday, April 22, 2015

Waypoints on the road to recovery


Mean-time-to-recovery is a key metric for many IT service groups. Lowering that number means frequent monitoring and a fast response to solve problems. It means having trained staff on hand at all hours, ready to jump into action. This is expensive


and really just kind of throwing good money after bad -- it's not attacking the root problem. The answer isn't to throw more people at it, it's to figure out how to apply automation to the problem.

But wait! The reason why more automation isn't applied to network management is that it's not to be trusted. The moment you think software is going to take care of an issue, the network is down and the software and scripts are useless. 

What you need is a game-changing way of doing things. Like Uplogix. We put that same network management approach that people bring, their runbook, and put it in the rack with the gear they want to manage. Connect over the console port, and you've eliminated the network dependence from network management. 

Let's see what this can do for the old mean-time-to-recovery metric. First, take a standard recovery timeline that begins with an event that sits in an undiscovered state until it is identified. Once this happens, recovery steps can begin. First, the problem must be isolated, or understood, and then proper steps can be taken to resolve the issue.

This video compares a standard timeline for recovery by IT staff with the timeline of an automated recovery by Uplogix. 


So the next time you are on call or you budget staffing levels for hitting your MTTR SLA's, think about Uplogix.