“The Morning After,” sung by Maureen McGovern, sums up best yesterday’s computer “glitches” that grounded the entire United Airlines (UAL) fleet for two hours, halted trading on the New York Stock Exchange (NYSE) for almost four hours, and took down the Wall Street Journal (WSJ) for about an hour.
There's got to be a morning after
If we can hold on through the night
We have a chance to find the sunshine
Let's keep on lookin' for the light
Oh, can't you see the morning after
It's waiting right outside the storm
Why don't we cross the bridge together
And find a place that's safe and warm
Here is what we know the morning after.
First the good news, it appears that the three events were unrelated and were not part of some type of coordinated attack by bad actors, i.e., hacktivists, criminal organizations or rogue nations. In other words, this was not “the big one” security experts have been predicting where bad guys bring the U.S. economy to a halt and destroy trust in government and other institutions. Whew!
As to the still sketchy detail?
UAL had an internal router malfunction which caused its reservations system to fail. This made them not in compliance with FAA rules about having the ability to match customers with the “do not fly” list and hence the fleet was grounded. This is not the first time UAL has had computing and communications issues as the integration of systems resulting from United Airlines acquisition of Continental Airlines has been a constant source of problems. What is surprising is the lack of redundancy at UAL that allow a router malfunction to wreak such havoc.
The NYSE apparently had to halt trading because a system-wide software upgrade did not take. While there is still forensic activity going on to document what actually happened. Wall Street has breathed a collective sigh of relief. Some are even claiming that the incident proves the system works since trading continued on other exchanges and clients did not suffer economic consequences. This is hardly heartening given the 2010 “flash crash” caused by a malicious trader in Britain which took years for the bad guy to be apprehended. NYSE spokespeople have been saying that they have to do better so this type of thing does not happen again. One would hope so!
The Wall Street Journal (WSJ) outage which directed people to a slimmed down version of the online version of the paper remains a mystery although it clearly was another technical “glitch.” However, WSJ—other than its now infamous posting during its problems, “Ooops, 504! something did not respond fast enough, that’s all we know” —has been mum as to the details. In fact, today’s paper barely makes note of it. In fact, this seems like a case where the cobbler’s children have no shoes.
The risks of living in a connected world
Surprise, Surprise! My inbox the last 24 hours has been on fire. In fact, it almost looks like I am under a DDoS attack from industry experts. Putting aside the really early comments regarding coincidences and conspiracy theories, the ones I resonated with had to do with the fragility of living in a connected world where so much of what is either important or we take for granted is software-based and thus subject to glitches.
As anyone of a certain age knows, going back to when programs were done with thousands of punch cards where one bad one could make the program unusable, the integrity of the code is what it is all about. It is truly garbage in garbage out, and whether it is a bad actor of any kind, or just human error, when code is altered or just is not correct or is entered incorrectly, as we have seen, the consequences are enormous. This includes the destruction of trust which once undermined is very hard to restore.
With that said, a few of my favorite observations follows.
Tim Erlin, director of IT security and risk strategy at Tripwire:
“There is virtually no part of our global economy that isn't dependent on interconnected technology today, and the level of interdependence continues to increase steadily. That means that any failure, malicious or not, has the potential to create economic repercussions.
There are many layers of technology between the consumer and the services we depend on, from the individual smartphone that you use to access a service, to the vendor who provides the networking equipment used by the telecommunications company to provide connectivity to the company providing the service. The level of complexity can be staggering, and this means an error made by a developer half-way around the world somewhere in the supply chain of a service can impact the operations of major businesses like United.
Collectively our goal should not be to eliminate errors; instead we should focus on providing resiliency in the face of known instability.
From a cybersecurity perspective, the obvious disruptions to service might not be what we need to worry about. Instead, we should be detecting the nearly invisible infiltration of valuable systems.”
Igor Baikalov, chief scientist at Securonix:
“What every enterprise should plan for is business continuity and disaster recovery – remember ‘A’ in CIA (Confidentiality, Integrity and Availability) attributes of Information Security? The problem is that High Availability (minimum downtime) and especially Fault Tolerance (no single point of failure) is very expensive, and for as long as the cost of implementation exceeds the cost of outage, businesses are not going to do it. Something has to be said on the maturity of change management processes too: it’s not the first rodeo for NYSE, and why there were no staged rollout and rollback plans in place is hard to comprehend. Was it really that much cheaper to deploy system-wide changes right before the opening bell, and bring the whole thing down, than to execute a careful deployment overnight, with sufficient time for testing and reversing the changes if needed?
I mean, these are serious companies with smart people doing expensive stuff – it’s not some low-life “Internet of Things” – how could the basic principles of Information Security be so ignored? Perhaps, I stick with the conspiracy theory of nation-state retaliation for the market crash, or alien invasion.”
Pierluigi Stella, Chief Technology Officer of Network Box USA:
“… please, let’s not blame technology. Yes, we do depend on it, but that isn’t going to change. In fact, it’s only going to be more and more so. And the more we depend on it, the less these issues happen because technology on its own doesn’t make mistakes and can be set to be redundant (alright, technology sometimes does make mistakes, but they can always be traced back to the human who configured or built them!) to avoid issues when something breaks.
Humans make mistakes ergo it’s the human element which needs to be vetted the most in this dependence of our world upon technology.”
For all of our sakes, if the latest series of breaches and “glitches” is not a wakeup call that it can and will likely happen to you if more early detection and prevention measures are not taken, than trust will be broken and finding a place “that's safe and warm” will get a whole lot harder. And, in a world that is becoming more digital and connected as the lifeblood of commerce, it will be hard to calculate the damages. As many have noted, even without human error, the vectors of vulnerability are expanding. Every institution, and individual for that matter, is constantly being probed by those with malice as their focus and monetization or destabilization as their goal. We need to be investing a lot more in how to be careful out there. The risks of virtual crashes are not exactly hiding in plain sight anymore. They are real and there are steps that can and must be taken to mitigate them.