Everyone knows who Amazon is — they’re massive in cloud computing, hosting services for countless organizations globally, including schools. So when a company that big encounters a service disruption, it resonates widely. Here’s how the recent Amazon Web Services (AWS) outage was resolved:

On October 19, 2025 in the US-East-1 region, I noticed elevated error rates and latency across multiple AWS services. It began around 11:49 PM PDT.
The root cause was traced by 12:26 AM PDT the next day to a faulty DNS update. This prevented applications from resolving server IPs — like a broken phonebook for the internet.
Because of that, more than 100 AWS services were affected. Services that relied on the core database service DynamoDB in particular caused cascading failures — for example, EC2 launches stalled, Lambda functions had issues, load balancer health checks failed.
As a user or system administrator, the ripple effects were visible everywhere: gaming platforms went offline, financial apps had login failures, even Amazon’s own systems (like Prime Video and e-commerce checkout) saw disruption.
How It Was Fixed
Here’s what AWS did to bring the systems back online:
- They flushed DNS caches and applied the fix for the core DynamoDB DNS issue by about 2:24 AM PDT.
- They temporarily throttled some operations (for example, asynchronous Lambda invocations, EC2 instance launches) to stabilize dependent subsystems.
- By around 3:01 PM PDT, AWS had confirmed that all services were fully restored, though some data-processing backlogs (for example in Redshift and Connect) remained to be cleared.
Final Thoughts
Contrary to what many people thought, this outage wasn’t caused by a cyberattack — rather, it appears to have been an internal update gone wrong.
Still, it’s a vivid reminder: even the biggest cloud provider can experience a disruption, and when they do, many of us feel it. Thinking proactively about architectural resilience and dependent-service risk is more important than ever.
Leave a comment