Why Facebook had its 'worst outage' in 4 yrs
Engineer explains why people could not complain about their bosses on Facebook for 2.5 hours
If you’re one of the 500 million Facebook users, you probably noticed that the site was down for over two hours on Thursday. If you’re one of the many Facebook “addicts,” it probably ruined your day. (How do you just go on acting like nothing happened??)
For Facebook, it was the “worst outage we’ve had in four years," said Facebook engineer Robert Johnson in a blog post Thursday evening. For a site that sees 50% of its active users log in every day and reaches over 150 million users on their mobile devices, that’s no trifling matter. The site, which experienced some 260 billion page views per month as of January 2010, went dead around 11:30 AM PST and wasn’t up and running again until 3 PM PST, giving only a mocking “DNS Failure” message to users who attempted to access the site.
Facebook actually had 38.46% availability during its first hour of downtime, and it was down at all 12 of its monitoring locations throughout the United States, according to AlertSite, a Web performance management company.
Johnson explained that yesterday’s failure was due to an automated system for verifying configuration values malfunctioned.
“We made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second,” Johnson explained. “To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued.”
So in essence, a minor glitch became an insurmountable glitch as millions of Facebook users attempted to access the site like a swarm of Lilliputians.
In addition to the site’s downtime, it was also experiencing a host of API problems and - most frustrating of all - the “like” button, which is embedded on over 350,000 pages throughout the Web, was malfunctioning.
Which really highlights just how pervasive Facebook presence has become as a cultural presence - one that spans several localities worldwide and bridges myriad other social and cultural entities.
This morning, while reading the New York Times Online, I noticed a list of my Facebook friends in the top right-hand corner. I wondered: How did my friends get here? They were recommending recent Times articles to me. Actually, they had recently linked Times articles to their Facebook pages, and those stories were recommended to me when I browsed the Times, which means that Facebook has now stretched its roots into my daily news experience, making information gathering a social event.
So what does it mean when possibly the single largest entity on the Web goes down for 2.5 hours? It means that a lot of people are going to be unhappy.
But for my fellow Facebook addicts, I have to say, I think this was a step forward for us in conquering our addiction. We made a breakthrough! You’re probably thinking, “I’m not an addict. I can quit whenever I want.” But can you really? Here are some symptoms to look for (if not in yourself, then in your friends and family):
- Going more than an hour or two without refreshing your Facebook page gives you the shakes and makes you break out in a cold sweat.
- You feel distant and disconnected at parties and social gatherings because you worry about that cow that escaped your FarmVille farm the other day…and how it’s doing…wherever it is…
- Your friends and family have expressed concern about how much time you spend on Facebook…which they’ve expressed by telling you to stop sending them so many $*@#ing game requests.
- Yesterday’s downtime made you go back to work.
Image source: cdn.com