
Massive AWS Outage Disrupts Major Websites And Apps Worldwide – How A Single Cloud Glitch Broke The Internet
By Erin Whitten
One of the core infrastructures that make up a huge part of the internet had a hiccup early Monday morning. Amazon Web Services (AWS) said in its status update that it had experienced “increased error rates and latencies” for several of its products and features in the US-East-1 region based in Northern Virginia.
A few minutes later, the first symptoms started to be reported. Apps and websites that rely on the affected services were unresponsive, slowly loading, or showing error messages. More surprising is the sheer number of services that run on AWS, something that people may not be aware of until something like this happens. Some of the well-known apps and websites affected by the AWS outage include Snapchat, Chime Banking, Fortnite, Ring, and the UK’s tax and government systems.
AWS eventually said that the cause of the problem was the DynamoDB endpoint in US-East-1, where it is seeing “requests returning errors”. Because US-East-1 is one of the most heavily utilized AWS regions, the errors cascaded quickly across systems that relied on the region from logins failing, transactions not going through, to apps not being accessible.
AWS said that the incident began at around 3:11 a.m. ET, and by 7:32 a.m. ET, AWS reported that it had mitigated the DNS issue and most operations had returned to normal. Users reported that even after the DNS issue was mitigated and AWS services said that they had returned to normal, some systems may take time to fully recover and get back to normal.
In terms of business impact, this is not the first incident that causes major outages, but it is another reminder that infrastructure or cloud outages are rarely truly regional, dependencies on cloud platforms are never really invisible, and clear communication during an incident is important. For many businesses, the outage should be a reminder to assess and re-evaluate their plans for resiliency and ask themselves if they could survive a region or provider failure. In this case, as a service that many major platforms rely on, when the cloud has a problem, everyone feels it.