AWS Outage Crashes Chunk of the Web

By Steve Anderson March 01, 2017

There's an old adage about putting all your eggs in one basket. Some don't recommend it. Others recommend that you do, but then immediately follow that up with watching that basket like a hawk. Recently we discovered just what can come of the one-basket strategy, as the basket that was Amazon Web Services experienced an outage that took a chunk of the Internet down with it.

Thousands of websites that turn to Amazon Web Services, including sites like Giphy, Business Insider, and Quora—and perhaps, in the greatest irony of all, the website Down For Everyone Or Just Me, which tells you if the website you can't access is having serious problems or if the problems are just on your end—went down in the outage. Perhaps most disturbingly of all, Amazon's own status Web page went down, and since that's hosted on Amazon Web Services as well, Amazon literally could not show that its own status was out.

Much of Amazon proper, meanwhile, was still operating, so people's ability to frantically shop for nearly anything was unaffected. It didn't take much time for Amazon to identify the potential cause—a number of “high error rates” from a server known as Amazon Web Services S3—and bring a fix to the system later that same day. In the meantime, though, plenty of sites were outright unreachable because the S3 system is used by 148,213 sites, according to market research firm SimilarTech..

This underscores one crucial lesson expressed by Harvard professor Jim Waldo, who noted “A lot of people have put their stuff on Amazon, so that means when the infrastructure breaks, which doesn't happen very often, lots of things break.” Waldo further noted that Amazon was “...a victim of their own success,” as people have “...become so used to their computing services being there that it's notable when it's not.”

It is something of a difficult position to be in; people expect the service to be up at all times, or down only for brief instances during ridiculous hours of the morning where most reasonable people are asleep anyway, and this well warned of in advance. Yet people fail to consider that not every system can be constantly up; many systems are still striving for “five-nines,” or 99.999 percent, uptime. Though Amazon's systems are impressive in their sheer reliability, it's still possible that said systems could fail.  When these systems fail, we get the results just recently seen: huge unexpected outages that no one has planned for.

It's time to start planning for these outages. Whether setting up backup systems, or having something else altogether in place, we need to be ready for when the worst happens, no matter what form that takes. If AWS can go out, anything can go out.




Edited by Alicia Young

Contributing Writer

SHARE THIS ARTICLE
Related Articles

6 Challenges of 5G, and the 9 Pillars of Assurance Strategy

By: Special Guest    9/17/2018

To make 5G possible, everything will change. The 5G network will involve new antennas and chipsets, new architectures, new KPIs, new vendors, cloud di…

Read More

Putting the Flow into Workflow, Paessler and Briefery Help Businesses Operate Better

By: Cynthia S. Artin    9/14/2018

The digital transformation of business is generating a lot of value, through more automation, more intelligence, and ultimately more efficiency.

Read More

From Mainframe to Open Frameworks, Linux Foundation Fuels Up with Rocket Software

By: Special Guest    9/6/2018

Last week, at the Open Source Summit, hosted by The Linux Foundation, the Open Mainframe Project gave birth to Zowe, introduced a new open source soft…

Read More

Unified Office Takes a Trip to the Dentist Office

By: Cynthia S. Artin    9/6/2018

Not many of us love going to see the dentist, and one company working across unified voice, productivity and even IoT systems is out to make the exper…

Read More

AIOps Outfit Moogsoft Launches Observe

By: Paula Bernier    8/30/2018

Moogsoft Observe advances the capabilities of AIOps to help IT teams better manage their services and applications in the face of a massive proliferat…

Read More