Note to Facebook: Don’t sit on the branch you are sawing off!
As you may have noticed, this week Facebook broke down. Not only did Facebook go down for six hours, but Facebook-owned Whatsapp and Instagram went down too. Now, it’s easy to snigger at the problems of a multi-billion dollar company, and maybe many parents will have felt a degree of relief as kids and teenagers were unable to interact with social media. But businesses use the applications too. Many smaller businesses market via Facebook and at Crisis Solutions we know that a number of clients use Whatsapp during incidents to communicate in the event of failure of internal communications applications. So it’s a bit more serious than the inability of billions of people around the world to waste time staring at their phones.
So, what can we learn as business continuity and crisis professionals? Well, a think we can learn five things.
1. Even the biggest platforms can have disruptions.
We can’t just assume that these huge, well-funded organisations won’t have problems. EVERY critical business process needs a contingency. Our biggest reliance at Crisis Solutions is Dropbox. It is a brilliant, convenient, secure solution to our needs, but if we couldn’t access our files for any reason it would be unacceptable. So we keep a few spare computers that don’t sync every day, so that if ever Dropbox was infected with ransomware we would still have access to (almost) all of our data.
2. Don’t underestimate the Power of Humans to mess things up.
Apparently Facebook’s problem was caused by an update to the Border Gateway Protocol, which as I understand it is a kind of ‘address book’ that maps ip addresses (which are strings of numbers) to domains, (which are words - such as Facebook.com). Yes I know that these updates are probably automatic, but I bet that someone, somewhere forgot to do something or didn’t do it properly and the update process went wrong.
3. Don’t sit on the branch you are sawing off.
So the answer to the internet-routing update was simple. Access the servers and reset them. Guess which network Facebook planned to use to access the servers that held the routing information? Yup, engineers planned to use the Facebook network, which was, err, down.
4. Make sure your contingency solution can stand on its own two feet.
Let’s get some Facebook engineers into a Tesla (no doubt) and get them to drive around to the data centre and fix the problem. Good idea. But according to a (hastily-deleted) Reddit post, the engineers couldn’t get in because their swipe cards relied on (you guessed it) authorisation from the Facebook servers.
5. Understanding your critical business processes is, well, critical.
Many organisations around the world spend a lot of time and effort mapping and testing their critical business process. In the UK, financial regulators are insisting that organisations embark on an ‘Operational Resilience’ programme to understand their risk appetite and tolerance levels for disruption of product or service delivery to stakeholders and test their resilience solutions.
Surely organisations like Facebook should be doing the same?