Amazon's Turn to Go Offline: 'We'd like to help you out! Which way did you come in?'

By

It is getting to be an all too familiar tale. One of the world’s most heavily trafficked websites becomes inaccessible, usually for unexplained reasons, for some short or even longish period of time. The world goes wild. 

Amazon is today’s victim. As my colleague Ed Silverstein so ably covered as the news was breaking, at approximately 2:50 PM EDT, if you went to Amazon.com you did not get their usual home page but instead were greeted as follows:

Website Temporarily Unavailable

Our website is currently unavailable while we make some improvement to our service. We’ll be open for business again soon, please come back shortly to try again.

Thank you for your patience.

And that was just for starters. If you did not love that one all you had to do was wait a few minutes for the replacement which was a 500 Error message that read:

Oops!

We’re very sorry, but we’re having trouble doing what you just asked us to do. Please give us another chance–click the Back button on your browser and try your request again. Or start from the beginning on our homepage.

BTW. That would be their inaccessible homepage.

As others working on this have pointed out, that was almost terse advice given the more extensive recommendation on Amazon.ca:

We’re sorry!

An error occurred when we tried to process your request. Rest assured, we’re already working on the problem and expect to resolve it shortly.

In the meantime, please note that if you were trying to make a purchase, your order has not been placed.

We apologize for the inconvenience.

Amazon.com was not inaccessible for long. I have not seen the precise timing, but it was fine when I checked back at 3:20PM EDT, and I gather from reports that the problem lasted about 10 minutes.  In fact, if you ever wish to check on what’ happen on the Amazon network bookmark the Amazon’s SERVICE HEALTH DASHBOAD. It may not be scintillating viewing but at least you can find out in real-time how they are or are not doing. In fact, as of 5:05PM EDT, the dashboard was showing as resolved, e.g., had been having issues but are under control:

  • Amazon Elastic Compute Cloud (N. Virginia)       
  • Amazon Mechanical Turk (Requester)                                   
  • Amazon Mechanical Turk (Worker)                                        
  • Amazon Mechanical Turk (Worker)                                        
  • Amazon Relational Database Service (N. Virginia)             
  • Amazon Relational Email Service (N. Virginia)                    
  • Amazon CloudFormation (N. Virginia)                          
  • Amazon Elastic Beanstalk (N. Virginia)
  • AWS Management Console                                                                                                                                  

In short, their Northern Virginia facility was having some challenges. 


Image via Shutterstock

This follows on the heels of Google going down this past Friday for five minutes. That little incident caused Internet traffic to drop 40 percent according to estimates as access was denied to most of Google’s services including search, Gmail and Talk. Google did a nice job of apologizing in a statement it issued:

"Between 15:51 and 15:52 PDT, 50 percent to 70 percent of requests to Google received errors; service was mostly restored one minute later, and entirely restored after 4 minutes," read the statement. "We apologise for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better."

They still have not said what happened, but at least their meantime to restoration was impressive given the extensiveness of the outage. In fact, compared to the August 14 seven hour outage that hit Microsoft’s Outlook.com webmail services and the four hours it took to restore its SkyDrive cloud storage service, this was literally the speed of light. 

In fact, the Microsoft v. Google response rates might have created a new key performance indicator (KPI), or it might just be the actual difference between having five-nine protection and four nines. A more sinister analysis has been that it just took NSA that much longer to reboot some kind of enhanced snooping capabilities.

What all three of the big outages have in common is the lack of specificity as what took them down, and an apology that amounted to “have a nice day!”

What they also all had in common was an inability to get a company executive to comment, albeit all three have executives known for giving the press the silent treatment except when they have a major announcement. 

And, if that was not annoying enough as we in the media try and tell customers what happened, they also have in common a flooding of my inbox with subject matter experts looking to explain things to grab some early mindshare. This has become standard operating procedure as well, although the angles are getting better and are more contextual if first reports are internal human error without evidence of bad guys scaling the walls with a major cyber attack. 

The twist in the last three has been a focus not on security, but money.   It has ranged from how much money each second of downtime cost, how much getting to perfection in uptime would cost and why companies do not invest in it, what might have happened if this had been an attack by bad guys and how much could be lost, best practices for disaster recovery and business continuity. There is always the ample dash of suggestions about the benefits of each inquiring minds’ solutions that they wanted me to spend time with their spokespeople discussing.

From a technical perspective, this means hearing about all of the risk mitigation techniques, backups (network, storage and physical space), the beneficial roll of better visibility and analytics so that IT departments can be more proactive rather than just reactive, and a host of other “helpful” tips usually in the form of Top 10 lists.

Where I come out on all of this is two-fold. 

First, we live in a digital age and as the old saying goes “sh#@ happens!” Five-nines is not perfection, but in reality, it should be good enough. As a consumer, I would like more transparency. This means tell me what happened and tell me what to do if it were to happen again. 

Second, as kind of a corollary to the first point, the marketing person in me would be telling companies that the real cost is to their brand’s reputation. The longer the silence the worse things will get. And, heaven forbid there is a cover-up of what went on, it is always worse than telling the facts at the outset. Again, transparency is the answer, and in weighing costs and benefits this is a no-brainer. 

Let’s just say when it comes to Microsoft, Google and Amazon, “we’d like to help you out! Which way did you come in?” is not a helpful approach. Apology acknowledged but hardly accepted. That is a lesson they can bank on.    




Edited by Rich Steeves
SHARE THIS ARTICLE
Related Articles

Coding and Invention Made Fun

By: Special Guest    10/12/2018

SAM is a series of kits that integrates hardware and software with the Internet. Combining wireless building blocks composed of sensors and actors con…

Read More

Facebook Marketplace Now Leverages AI

By: Paula Bernier    10/3/2018

Artificial intelligence is changing the way businesses interact with customers. Facebook's announcement this week is just another example of how this …

Read More

Oct. 17 Webinar to Address Apache Spark Benefits, Tools

By: Paula Bernier    10/2/2018

In the upcoming webinar "Apache Spark: The New Enterprise Backbone for ETL, Batch and Real-time Streaming," industry experts will offer details on clo…

Read More

It's Black and White: Cybercriminals Are Spending 10x More Than Enterprises to Control, Disrupt and Steal

By: Cynthia S. Artin    9/26/2018

In a stunning new report by Carbon Black, "Hacking, Escalating Attacks and The Role of Threat Hunting" the company revealed that 92% of UK companies s…

Read More

6 Challenges of 5G, and the 9 Pillars of Assurance Strategy

By: Special Guest    9/17/2018

To make 5G possible, everything will change. The 5G network will involve new antennas and chipsets, new architectures, new KPIs, new vendors, cloud di…

Read More