Wednesday, January 28, 2015

Config errors better than hackers?

The Facebook downtime event on January 27th, where an hour-long outage at Facebook cut off access to critical status updates worldwide in addition to Instagram posts and Tinder hook-ups, shows the sensitivity of large organizations to public perception

of hacking threats.

The Lizard Squad hacking group, which apparently took control of the Malaysia Airlines website over the weekend and claims responsibility for previous hacks at Sony and others immediately claimed responsibility for the Facebook event.

Facebook was quick to lay blame squarely on their own IT staff with the following statement it issued shortly after the outage:
"Earlier this evening many people had trouble accessing Facebook and Instagram. This was not the result of a third party attack but instead occurred after we introduced a change that affected our configuration systems. We moved quickly to fix the problem, and both services are back to 100 percent for everyone."
The event shows the current state of concern when it comes to data breaches and denial of service attacks. Companies would much rather admit to dropping from a 99.999% availability rate (about 5 minutes of downtime yearly) to 99.99% (just under an hour a year), than being open to outside attack and the potential for data theft. Especially with a target as rich as Facebook - they count more than a billion web users and just over a billion mobile phone users.

But don't think not having a hacking event means everything is happy at Facebook. Here is some insightful follow-up commentary from Facebook VP of Engineering, Jay Parikh:


One take-away from this is that whether your network supports tens of users or billions, you'd better be able to identify who is to blame when there are issues. And fast. At Uplogix we like to refer to this as the Mean Time to Innocence, or that period between knowing there is a problem, and identifying who owns that problem. 

Read more about Mean Time to Innocence in one of the most-read entries of all time on the Uplogix Blog: MTBF and MTTR are important, but so is MTTI.