Friday, November 16, 2012

Getting to the truth on Service Level Agreements

Which side of SLAs do you sit on? Are you gunning to meet required service levels for your customers, or are you watching your provider like a hawk to make sure you get what you pay for?

Either way, service level agreements are only as good as:
  1. What they measure - are these really the key network deliverables that are important for your business objectives?
  2. Can they be enforced - are they accurately and readily measurable?
  3. Are they worth it - Is the penalty balanced against the ongoing risk?
First, it's important that both sides agree on the metrics that matter. The customer needs to know their business requirements and be able to specify realistically what they need, plus be willing to pay for it. Five-9s of reliability? It looks good on paper, but accurate assessment of business requirements is key to making the right choices for providers and SLAs.

Next, assuming you know what service levels you need, are they measurable? What are the pitfalls that aren't obvious during contract negotiations, but will appear when trying to enforce an SLA penalty? When service goes down, the fingers start pointing.

This is one area where Uplogix can help out. The unique position of a Local Manager (LM) in the network stack, but operating independently of the network makes it possible to measure service levels in multiple ways. LMs can conduct point-to-point testing where one Local Manager accesses another Local Manager over the same WAN connections as the rest of the network infrastructure. The LM can make and grade synthetic voice calls, as well as simulate user interactions by grading HTTP and IP performance metrics.

And, just as importantly, these tests can be automated and scheduled so the monitoring is always happening -- not just in response to a customer that already has a problem. This proactive nature means that some problems can be spotted early, and avoided, or at least more quickly and accurately triaged when they occur.

Service Level Verification with Uplogix falls into two types:
  • Passive Monitoring of Traffic | With its console connections to network gear like switches and routers, the LM can use this Layer 2 visibility to collect/report/act on 36 different values such as Rx, TX, CRC, Load, Line Protocol Status and others.
  • Active Monitoring of Traffic | This is Layer 4 testing where one LM connects directly to another over the same network path that your users depend on. Over 45 values can be collected related to VoIP and IP performance such as Jitter, Latency, MOS, R value and more.
Finally, are your SLAs worth it? Does the penalty aspect for the provider justify the added cost they will charge to cover their risk? This was the topic of an article on the Webtorials site. Author Beatrice Piquer Durand summed it up:
"...poorly used SLAs prove to be wooden swords. In many cases, the financial incentive is quite low or subject to clauses that prevent them being really dissuasive. In another hand, too big penalties will have a large cost as suppliers must take the risk into account in their pricing. Even worst, badly defined SLAs might even distort service operations and results by following fake goals and hitting useless targets.

More importantly, in complex environment when things are going in the wrong direction and the supplier really cannot deliver what is expected for whatever reason, the situation is generally such that no one really care about penalties anymore - Try to explain to the on-line Sales Director that her web site poor performance is not an issue thanks to the penalties that you'll try to get from your hosting provider in the next six months... good luck! The only thing that really matters for her it that the service performs and that she can get back to the business as fast as possible."
So whether you are subject to hitting the SLAs or collecting on them, it's like the Dire Straits song: "Sometimes you're the Louisville slugger, sometimes you're the ball," but with some careful evaluation and an handy tool for generating automated, actionable metrics like Uplogix, it's easier to be the windshield instead of the bug.