Thursday, May 23, 2013

Ethernet hits 40

Yesterday was the 40th anniversary of the memo that solidified months of theorizing by Bob Metcalfe outlining the invention of Ethernet. Working at the Xerox Palo Alto Research Center (PARC), Metcalfe was inspired by a network that connected a central time-sharing computer in Oahu to other Hawaiian Islands using low-cost commercial radio equipment.

Metcalfe's goal was to adapt the Hawaiian radio network for "in-building minicomputer communication." Buy why the name Ethernet? Wanting to keep the mechanism for transmission generic, the memo stated:
"While we may end up using coaxial cable trees to carry our broadcast transmissions, it seems wise to talk in terms of an ether, rather than 'the cable', for as long as possible. This will keep things general and who knows what other media will prove better than cable for a broadcast network; maybe radio or telephone circuits, or power wiring or frequency-multi-plexed CATV, or microwave environments, or even combinations thereof."
His use of the term ether was from luminiferous ether, a 19th century theory on a invisible substance that acted as the transmission medium for electromagnetic waves. This was eventually disproven, in part by Einstein's theory of relativity.

The PARC experimental network expanded into the DIX standard with the partnership of Digital, Intel and Xerox. Early challenges to Ethernet included difficulties reaching standards and competition from Token Ring and Token Bus also vying to become the IEEE standard. By the late 1980s, Ethernet had become the dominant LAN technology.

Over the last 40 years, Ethernet speeds have increased by an order of magnitude roughly every 10 years: 10Mbps in 1973-83 to 100Mbps in 1993, 1G in 1998, 10G in 2002 and 100G in 2013. The current theoretical speed is 400G. With video increasingly making up the content load and more and more people posting and viewing it, higher speeds will be required. The timeframe for 400G has it being ratified by the IEEE in 2017.

Beyond that? By 2023-24 Terabit Ethernet could be ratified after a study group begins in 2019-20. Then 10T Ethernet 10 years after that, followed by 100T, and then Petabit Ethernet some 80 years after Ethernet’s invention in a Xerox memo.

And beyond even that? Ethernet connections directly to the brain? Speaking at the Ethernet Technology Summit, Jane Li, COO of Huawei Enterprises, envisioned a Brain-to-Computer Interface. The BCI is implanted in a human brain, perhaps to control prosthetic limbs with access to the cloud for storage and upgrades. She went on to venture Brain-to-Brain interfaces in 2053 taking advantage of Petabit Ethernet. Prepare to be assimilated...

Managing one of these B2B interfaces would bring a whole new level to the idea of local management.

Friday, May 17, 2013

Does your data center have "green fatigue?"

A recent survey by the Uptime Institute indicates that fatigue is beginning to hit some data centers when it comes to the quest for greener and greener operations. In the most recent survey, only half of North American respondents considered energy efficiency to be very important to their companies. This continued a two-year trend, as that number has dropped from 58% in 2011 to 52% in 2012.

The decline in interest was most evident at smaller data centers. The conclusion, said Matt Stansberry of the Uptime Institute, is that these smaller operations tend to have fewer engineers and less money to spend on energy efficiency projects. These projects, like raising server inlet temps and installing variable speed fans are viewed as potentially risky. Just not something you want to do unless you have a big enough stuff staff to devote to things outside of regular operations.

The report goes on to say that it's probably not that interest in energy efficiency is down, it's that respondents might just be sick of hearing about it. The news on innovations in the field is generally about advancements at big companies with lots of people and money to throw at these problems, which might frustrate the people at smaller shops that are stuck with older hardware and limited options for innovation.

A new approach?

What about applying a little Local Management to this situation? Our story for managing data center infrastructure is pretty much the same as what we do in branch office situations for routers, switches and other networking devices. Even though physical distances in a data center might not be as great as a network of remote offices, sometimes security controls can make devices in a locked-up cage just as difficult to access directly.

One of the challenges of saving power by simply turning off machines not currently needed is that when you take a server down, it doesn't always come back up. Often it's just easier to leave things running than check on devices all the time. Uplogix can help here, and it won't take a squad of Google or Amazon data center engineers to implement.

Some of the key local management functionality for servers includes:

  • Secure Access | Provides secure access to remote devices with web-only management interfaces or no console port without requiring additional overhead to manage, leaving no need to consume an additional switch port, maintain VLANs or manage user access for more devices. Uplogix also integrates with TACACS+ and RADIUS for remote authentication.
  • Service Processor Automation Using IPMI | Provides users with the ability to manage, diagnose and recover servers, even if the OS has hung or the server is powered down.
  • KVM over Service Processor | Allows IT administrators to gain local access and control to a remote server (i.e. provisioning, monitoring, troubleshooting, restricting access) via a local desktop without having to deploy external KVM appliances, functioning independently of the server’s operating system or primary network connection utilizing an automated out-of-band connection.
Rules can be created to issue shutdown and startup commands based on a variety of factors. The key is that with its persistent monitoring and access capabilities, a Local Manager is not only watching what's going on, but can follow your run book procedures to recover from issues. Automatically. 

For cases when Uplogix can't recover a device, administrators can get the same access from the NOC as they would have out on the data center floor. 

Save your energy, and save energy at the same time

So, it might not be as flashy as converting your data center cooling to tide-powered sea water, or running the entire operation on geothermal energy sources, but this is a solution that won't take millions of dollars and hundreds of people to implement. 

Think about Uplogix for your data center today, and read Uplogix for Servers, for more information.

Tuesday, May 14, 2013

MTBF and MTTR are important, but so is MTTI

MTBF (mean time between faults) and MTTR (mean time to recovery) are important measurements that usually factor into the creation of SLAs (service level agreements). Another important measurement within IT groups when it comes to their management tools should be MTTI, or the mean time to innocence. When there is a problem, this is the idea that it's important to know if the problem is your fault, or lies elsewhere. It's easier to enforce SLAs when you can cut through finger pointing early in the event.

Wikipedia tells us that Mean time between failures (MTBF) is the predicted (average) time between failures of a system during operation. Mean time to repair (MTTR) represents the average time required to repair a failed component. A tool like Local Management from Uplogix can lower MTBF by automating routine network management tasks, which removes opportunities for human error. Detailed monitoring combined with rules and alerts can notify administrators of a potential problem, letting them intervene and potentially avoid a failure.

MTTR is reduced with Uplogix in multiple ways. First, the direct connection to managed devices over the console port means more frequent and more detailed monitoring. Uplogix will know there is a problem and specifically what the problem is very quickly. Our default polling interval is 30 seconds. And since we are not monitoring devices over the network, we'll still be able to talk to all the devices in the rack and report back over an out-of-band on exactly what the situation is. If Uplogix can't fix an issue automatically, it will already have tried your initial run book steps, so you won't have to start at #1.

This factors into mean time to innocence. In the traditional SNMP model when there are network issues, polling goes down. Is it a carrier problem? A last-mile issue? Something in your branch office infrastructure? The downtime clock is running and each stakeholder starts troubleshooting from page one of the run book. Or worse, finger pointing begins. Tick tick tick.

MTTI doesn't necessarily protect you from downtime, but it can focus the recovery efforts on where the problem lies, directly reducing MTTR. This is also helpful for enforcing SLAs. Of course, the goal is not to have to collect on missed SLAs. As Andy Gotlieb said in a recent article:
If the carrier violates the terms of the SLA, its biggest penalty is that it will owe you a portion of your monthly bill back. The more "generous" SLAs will say that if the outage lasts for too long a period of time, they'll refund your entire month's bill. The problem, of course, is that you don't want a free month's service – you want to avoid the very high cost of downtime to your enterprise. But no carrier will give you an SLA where they commit to compensate you for what that lost connectivity time is worth to you and your firm.
So as you worry about MTBF and MTTR, consider the impact that Uplogix Local Management can have on these metrics as well as giving you a way to obtain mean time to innocence, or MTTI. It's always nice to be able to show when it's not your fault, the ever-popular CYA metric. 

Friday, May 3, 2013

Should you worry more about compliance or risk?

When is the right time to think about
compliance versus risk?
A recent article in Network World interviewed the CIOs of Underwriters Laboratories and the Minnesota Department of Veteran Affairs on this topic and generated some interesting comments. Both make the expected argument that you can't pursue one over the other, but in the end they say risk is the key consideration.

Christian Anschuetz of UL uses the story of the Titanic to illustrate his point. When it sunk into the North Atlantic 101 years ago losing over 1,500 people, the captain, crew and the White Star Line had complied with regulations at the time by providing the number of life boats required. The regulations were clearly not up to the risk faced by the vessel and its passengers

Non-compliance is another form of risk. Barely a day passes without a story of a hefty fine levied against a firm that violated a HIPAA privacy rule or did not comply with PCI standard for data security. In these cases, compliance is it's own risk category.

CIO Dan Abdul offered nine tasks for avoiding unnecessary risk or overcompensating with too many controls by determining your organization's risks:
  • Risk of failing to fully comply with regulations
  • Loss of intellectual property and any sensitive information
  • Impact of disasters and unplanned events
  • Impact of an event which adversely affects the brand image of the organization
  • Gaining stakeholder feedback on impact and likelihood of these risks
  • Benchmarking existing process for managing the risks identified as concerns by stakeholders
  • Identifying the costs required to address the risks
  • Performing a cost/risk analysis
  • Prioritizing control efforts accordingly
The challenge with compliance is that the regulations generally are in response to previous incidents. They try to point out the risks, but don't really provide a set of controls to determine if you are absolutely compliant. That comes down an interpretation of the auditor. Another risk.

Abdul adds, "More importantly, if you implement every control recommended for any regulation and still have a breach, you are not protected from law suits and fines from the regulating entity."

Improving compliance and reducing risk with Local Management

There is no silver bullet for IT compliance, but Uplogix addresses some areas that are fairly unique. Uplogix extends role based administrative access policies to network devices and by providing detailed auditing and reporting in support of attaining and demonstrating regulatory compliance. All of these capabilities are maintained even in the event of a network outage.

For more on Uplogix and IT policy enforcement capabilities as well as audit and compliance reporting, see the Security and Compliance Management section of