Amazon's Turn to Go Offline: 'We'd like to help you out! Which way did you come in?'

By

It is getting to be an all too familiar tale. One of the world’s most heavily trafficked websites becomes inaccessible, usually for unexplained reasons, for some short or even longish period of time. The world goes wild. 

Amazon is today’s victim. As my colleague Ed Silverstein so ably covered as the news was breaking, at approximately 2:50 PM EDT, if you went to Amazon.com you did not get their usual home page but instead were greeted as follows:

Website Temporarily Unavailable

Our website is currently unavailable while we make some improvement to our service. We’ll be open for business again soon, please come back shortly to try again.

Thank you for your patience.

And that was just for starters. If you did not love that one all you had to do was wait a few minutes for the replacement which was a 500 Error message that read:

Oops!

We’re very sorry, but we’re having trouble doing what you just asked us to do. Please give us another chance–click the Back button on your browser and try your request again. Or start from the beginning on our homepage.

BTW. That would be their inaccessible homepage.

As others working on this have pointed out, that was almost terse advice given the more extensive recommendation on Amazon.ca:

We’re sorry!

An error occurred when we tried to process your request. Rest assured, we’re already working on the problem and expect to resolve it shortly.

In the meantime, please note that if you were trying to make a purchase, your order has not been placed.

We apologize for the inconvenience.

Amazon.com was not inaccessible for long. I have not seen the precise timing, but it was fine when I checked back at 3:20PM EDT, and I gather from reports that the problem lasted about 10 minutes.  In fact, if you ever wish to check on what’ happen on the Amazon network bookmark the Amazon’s SERVICE HEALTH DASHBOAD. It may not be scintillating viewing but at least you can find out in real-time how they are or are not doing. In fact, as of 5:05PM EDT, the dashboard was showing as resolved, e.g., had been having issues but are under control:

  • Amazon Elastic Compute Cloud (N. Virginia)       
  • Amazon Mechanical Turk (Requester)                                   
  • Amazon Mechanical Turk (Worker)                                        
  • Amazon Mechanical Turk (Worker)                                        
  • Amazon Relational Database Service (N. Virginia)             
  • Amazon Relational Email Service (N. Virginia)                    
  • Amazon CloudFormation (N. Virginia)                          
  • Amazon Elastic Beanstalk (N. Virginia)
  • AWS Management Console                                                                                                                                  

In short, their Northern Virginia facility was having some challenges. 


Image via Shutterstock

This follows on the heels of Google going down this past Friday for five minutes. That little incident caused Internet traffic to drop 40 percent according to estimates as access was denied to most of Google’s services including search, Gmail and Talk. Google did a nice job of apologizing in a statement it issued:

"Between 15:51 and 15:52 PDT, 50 percent to 70 percent of requests to Google received errors; service was mostly restored one minute later, and entirely restored after 4 minutes," read the statement. "We apologise for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better."

They still have not said what happened, but at least their meantime to restoration was impressive given the extensiveness of the outage. In fact, compared to the August 14 seven hour outage that hit Microsoft’s Outlook.com webmail services and the four hours it took to restore its SkyDrive cloud storage service, this was literally the speed of light. 

In fact, the Microsoft v. Google response rates might have created a new key performance indicator (KPI), or it might just be the actual difference between having five-nine protection and four nines. A more sinister analysis has been that it just took NSA that much longer to reboot some kind of enhanced snooping capabilities.

What all three of the big outages have in common is the lack of specificity as what took them down, and an apology that amounted to “have a nice day!”

What they also all had in common was an inability to get a company executive to comment, albeit all three have executives known for giving the press the silent treatment except when they have a major announcement. 

And, if that was not annoying enough as we in the media try and tell customers what happened, they also have in common a flooding of my inbox with subject matter experts looking to explain things to grab some early mindshare. This has become standard operating procedure as well, although the angles are getting better and are more contextual if first reports are internal human error without evidence of bad guys scaling the walls with a major cyber attack. 

The twist in the last three has been a focus not on security, but money.   It has ranged from how much money each second of downtime cost, how much getting to perfection in uptime would cost and why companies do not invest in it, what might have happened if this had been an attack by bad guys and how much could be lost, best practices for disaster recovery and business continuity. There is always the ample dash of suggestions about the benefits of each inquiring minds’ solutions that they wanted me to spend time with their spokespeople discussing.

From a technical perspective, this means hearing about all of the risk mitigation techniques, backups (network, storage and physical space), the beneficial roll of better visibility and analytics so that IT departments can be more proactive rather than just reactive, and a host of other “helpful” tips usually in the form of Top 10 lists.

Where I come out on all of this is two-fold. 

First, we live in a digital age and as the old saying goes “sh#@ happens!” Five-nines is not perfection, but in reality, it should be good enough. As a consumer, I would like more transparency. This means tell me what happened and tell me what to do if it were to happen again. 

Second, as kind of a corollary to the first point, the marketing person in me would be telling companies that the real cost is to their brand’s reputation. The longer the silence the worse things will get. And, heaven forbid there is a cover-up of what went on, it is always worse than telling the facts at the outset. Again, transparency is the answer, and in weighing costs and benefits this is a no-brainer. 

Let’s just say when it comes to Microsoft, Google and Amazon, “we’d like to help you out! Which way did you come in?” is not a helpful approach. Apology acknowledged but hardly accepted. That is a lesson they can bank on.    




Edited by Rich Steeves
Get stories like this delivered straight to your inbox. [Free eNews Subscription]
SHARE THIS ARTICLE
Related Articles

ChatGPT Isn't Really AI: Here's Why

By: Contributing Writer    4/17/2024

ChatGPT is the biggest talking point in the world of AI, but is it actually artificial intelligence? Click here to find out the truth behind ChatGPT.

Read More

Revolutionizing Home Energy Management: The Partnership of Hub Controls and Four Square/TRE

By: Reece Loftus    4/16/2024

Through a recently announced partnership with manufacturer Four Square/TRE, Hub Controls is set to redefine the landscape of home energy management in…

Read More

4 Benefits of Time Tracking Software for Small Businesses

By: Contributing Writer    4/16/2024

Time tracking is invaluable for every business's success. It ensures teams and time are well managed. While you can do manual time tracking, it's time…

Read More

How the Terraform Registry Helps DevOps Teams Increase Efficiency

By: Contributing Writer    4/16/2024

A key component to HashiCorp's Terraform infrastructure-as-code (IaC) ecosystem, the Terraform Registry made it to the news in late 2023 when changes …

Read More

Nightmares, No More: New CanineAlert Device for Service Dogs Helps Reduce PTSD for Owners, Particularly Veterans

By: Alex Passett    4/11/2024

Canine Companions, a nonprofit organization that transforms the lives of veterans (and others) suffering PTSD with vigilant service dogs, has debuted …

Read More