Credited from: NPR
On October 20, Amazon Web Services (AWS) experienced a massive outage that rendered popular apps and websites, including Snapchat, Fortnite, and banking apps, unusable for millions of users worldwide. The disruption began around 3 a.m. ET, leading to significant operational failures across various sectors, with over 6.5 million user reports logged by outage tracker Downdetector according to channelnewsasia, indiatimes, and bbc.
The root cause of the outage was identified as a Domain Name System (DNS) resolution issue stemming from a technical update to DynamoDB, AWS's key database service, located in its US-EAST-1 data center in Virginia, according to reports from indiatimes and npr. This failure prevented applications from finding the correct server addresses, triggering cascading failures across connected services.
During the disruption, various financial institutions, social media platforms, and even AWS's own services like Ring and Alexa reported severe operational hiccups, as noted by scmp and indiatimes. Notable companies affected included Coinbase, Duolingo, and Uber rival Lyft.
After 15 hours, AWS announced the restoration of services around 6 p.m. ET, although reports indicated that some platforms continued to experience issues due to backlogged processing. AWS confirmed that steps taken throughout the day showed signs of recovery, but the outage revealed weaknesses in the heavy reliance on a few cloud providers for crucial technology infrastructure, as indicated by experts quoted in sources such as indiatimes and channelnewsasia.
As experts have pointed out, the incident might provoke scrutiny over how businesses can better build redundancy into their infrastructures, highlighting a critical risk in the current cloud-based operating environment according to bbc, npr, and scmp.