Several Facebook services showed instability last Monday (04), and were unavailable for at least 6 hours. In this article, we explain what happened.
- WhatsApp, Facebook and Instagram went through instability last Monday.
- Facebook confirmed that the problems were related to DNS server failures and BGP protocol;
- Company apologizes for the inconvenience caused by the failure.
Facebook services go down from 11:40AM ET
According to the DownDetector, at least WhatsApp, Facebook, Instagram and Messenger remained inaccessible in the early afternoon of yesterday, around 11:40AM ET.
Initially, mobile applications and desktop versions showed the classic loading icon, but did not refresh the page or remained down. Everything indicated that the problem was related to a failure in Facebook servers, as we have had in the past.
On Twitter, the company communicated about 30 minutes after the first reports of a drop in services, that it was aware of the failures and working on a solution:
We’re aware that some people are having trouble accessing our apps and products. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.
WhatsApp, following Facebook, also published a quick note on Twitter.
Facebook confirms DNS and BGP server problems
In an official note, Facebook reported that the unavailability was the result of DNS and BGP routing problems. As a consequence, the company's services were down for at least six hours.
Known as the Domain Name System, DNS acts as a translator of a website's IP number to an easier URL to remember. Practically, instead of you typing the IP "220.127.116.11" to find NextPit, just type "www.nextpit.com". DNS is the system that makes the connection between the IP and the friendliest name of a domain.
DNS failures are even common. Last week, Slack experienced instability for much of the day due to problems with some DNS servers.
However, what aggravated the problem in the case of Facebook was a failure in the routine update of BGP, which ended up erasing the DNS routing information necessary for other networks to find Facebook services.
BGP or Border Gateway Protocol is an Internet standard that directs traffic to the destination. This protocol acts as a bridge between your mobile phone and Facebook services, for example. By erasing this BGP bridge, Facebook's DNS servers and their services became unattainable.
In a statement, the company apologised for the inconvenience caused yesterday:
To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.
Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.
Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.
People and businesses around the world rely on us everyday to stay connected. We understand the impact outages like these have on people’s lives, and our responsibility to keep people informed about disruptions to our services. We apologize to all those affected, and we’re working to understand more about what happened today so we can continue to make our infrastructure more resilient.
If you are still experiencing failures in Facebook networks, I share here some alternatives to WhatsApp and Messenger messaging services, in case you need to contact someone:
So, what did you do during the Facebook massive outage? Did you lose or save time or money in any way?
This article was updated on October 5, at 6AM EST, with information about the reasons for the fall in Facebook services the day before.