Amazon formally apologized on Friday for last week's Web services outage that crippled a number of major websites, including Foursquare, Reddit, Quora, Hootsuite and Moby.
The online retailer, which leases out Web services and storage capacity to companies via its cloud-based EC2 solution, also explained the reasons behind the colossal failure and how it plans on preventing a repeat occurrence.
In a nearly 6,000-word document, Amazon details the widely scrutinized outage that began just after midnight on April 21. The overly technical explanation boils down to the fact that a human error began a chain of events that is sure to cost Amazon dearly.
The outage initially occurred in a Virginia data center when an erroneous network configuration change was performed during an upgrade to the primary network. Instead of shifting traffic to the other router on the primary network, the traffic was moved onto the lower capacity redundant EBS network. This caused many of the servers to get "stuck," as Amazon puts it.
"Unlike a normal network interruption, this change disconnected both the primary and secondary network simultaneously, leaving the affected nodes completely isolated from one another," Amazon said.
The human error and technical glitches that followed brought down several major websites and resulted in the permanent loss of data for a select number of customers.
To ensure that nothing similar happens again, Amazon said that it will audit the traffic change process and increase the automation that is involved in the procedure. The company will also be modifying its capacity, adding more regular updates and improving its communication channels with customers.
Amazon has offered clients that were using the zone in question a 10-day service credit, even if their resources and applications were found to be unaffected by the errors.
The lengthy apology and the 10-day credit may not be enough to win back all of Amazon's customers. Some clients even jumped on Twitter to quickly respond to the Amazon's announcement.
"10 days of credit? That's it?" tweeted Stan Olson, operator of StansWeather.net. "I'm still dealing with issues because of this, pretty frustrating."
Amazon's Web services wing only accounts for a small part of the company's current revenue stream. Still, the retailer was hoping to make the vertical a major part of its future, according to the AP.
Click here to read some industry opinions on the outage and the cloud services market in general.
Beecher Tuttle is a TechZone360 contributor. He has extensive experience writing and editing for print publications and online news websites. He has specialized in a variety of industries, including health care technology, politics and education. To read more of his articles, please visit his columnist page.
Edited by Jennifer Russell