Tuesday, October 18, 2011

Culture as important as technology for avoiding network downtime?

In a recent Network World article, Cisco's Denise Fishurne cautions recent attendees of Cisco Live returning to their workplace with aspirations of applying new knowledge for high availability and fast convergence to start off with a little introspection first.

She asserts that preparing your network for recovery once there is a failure is about more than the network -- you have to take into account the engineers that are involved and the management policies that they operate under. She says to ask yourself, "What can be done so that when failure occurs the transition from failure to recovery happens as quickly as possible?"

It's a basic question, and of course at Uplogix, our answer is to plug automation into the scenario to limit the need for human involvement to the atypical downtime scenarios that require advanced troubleshooting. Use Local Management for better monitoring and to take care of the level 1-type causes of downtime, and you reduce the number of -- and opportunities for -- additional problems during recovery situations.

To quote a bit more from the article:
"Now, how many times have you had a network operation failure that was caused by something that a human did? Or by something you didn't expect to have happen and didn't account for before? See that "action" box below?  That gets bigger and bigger (delaying getting to recovery) whenever a human (or multiple humans) has to get involved and troubleshoot what is going on.
 The trick is to figure out ways to get this box smaller and smaller and smaller."
We couldn't agree more. And it's no trick apply local management for more in-depth monitoring and automated device recovery while increasing security and reducing opportunities for human error. and we have a solution.