How to Resolve Critical IT Incidents Fast

By

In today’s always-on, always-connected business, something as small as an operational glitch can bring an extended enterprise to its knees. So how IT communicates in the first few moments of even the smallest service outage is crucial. But manual processes, diversified IT infrastructures and dispersed workforces can complicate these communications, increasing downtime and impacting the business.

Here are the best practices for automating the IT incident management communications process to ensure IT incidents are resolved as quickly and effectively as possible:

Identify        

The hallmark of a major incident is service disruption. In most cases, an operations manager identifies a major incident and escalates to a major incident manager, but some companies automate the routing of major incidents. Establish resolution processes for less critical issues as well, so they don’t go to major incident managers unnecessarily.

Engage

A communication system should prepopulate major incident managers with their contact information and on-call schedules, enabling the operations manager to instantly locate them and target notifications to them. Automating this initial engagement can have huge benefits, reducing a lengthy (20 minutes) process by up to 90 percent.

Triage

The manager who accepts the case determines whether the alert is a false alarm and what the incident actually is. The 2014 SANS survey (sponsored by AlienVault) revealed that 15 percent of organizations have issues with false positives.

Find and Assemble

Once the major incident manager understands the nature of the incident, he assembles the appropriate resolution team members based on the skills required. Common roles include a service desk manager, service-level agreement (SLA) manager, change manager, software developer, quality assurance, operations engineer, infrastructure engineer and problem manager. When an IT outage occurs and a notification is received, these members drop everything else to work to resolve the issue.

Some companies choose to have a SWAT team at the ready, instead of making sure someone from each required department can free himself to help.

Collaborate effectively

Just assembling people and initiating the conference call can take 45 minutes. Companies that use mass communications during this phase often get way too many people on the conference bridge. Each new person that joins interrupts the flow of the call. Repeating the background information for each person can waste an additional 10 minutes.

With a leading communication platform, the major incident manager can customize the message so resolution team members understand the basics before the call, and can join the bridge with just a button push instead of having to dial in. These messages can often be technical in nature, using IT jargon and specific server names.

Proactively and intelligently informing those affected with a more business-friendly message enables Marketing, PR, and executives to communicate responsibly, effectively and consistently.

Resolve

Image via Shutterstock

Members of the major incident team use all available communication channels integrated with their communication platform, including chat, text, email, phone, Skype, Slack, and more to identify and resolve the underlying cause of the issue. The communications also enable the major incident manager to keep stakeholders up to date and let his or her team members resolve.

Restore

Once the underlying issue is resolved, the team members can restore service and end the incident.

Review (Post Mortem)

A review is a fundamental piece of the incident resolution process, and all relevant parties should attend. The incident should have been documented and recorded, so the major incident manager and the problem manager should walk the group through the incident record, so they can assess the resolution process together.

The review can also identify improvements that can prevent a similar incident from occurring again.

If an enterprise’s data, information and processes become compromised, the business can suffer irreparable damage. So when major incidents occur, how the communication is managed is everything. These tips will help enterprises implement the automated communications processes needed to get the business up, running and restored quickly and easily in the event of an IT outage of any size.

About the Author: Troy McAlpin brings more than 20 years of experience to his leadership role at xMatters, with expertise in process automation, strategic initiatives and corporate strategy. His domain experience includes technology strategy and vertical market expertise including high tech, banking, consumer and retail industries.  Prior to xMatters he managed marketing, sales, development, M&A and financial aspects at two successful start-up companies and also worked at AT&T (News - Alert) Solutions and Andersen.




Edited by Dominick Sorrentino
Get stories like this delivered straight to your inbox. [Free eNews Subscription]
SHARE THIS ARTICLE
Related Articles

ChatGPT Isn't Really AI: Here's Why

By: Contributing Writer    4/17/2024

ChatGPT is the biggest talking point in the world of AI, but is it actually artificial intelligence? Click here to find out the truth behind ChatGPT.

Read More

Revolutionizing Home Energy Management: The Partnership of Hub Controls and Four Square/TRE

By: Reece Loftus    4/16/2024

Through a recently announced partnership with manufacturer Four Square/TRE, Hub Controls is set to redefine the landscape of home energy management in…

Read More

4 Benefits of Time Tracking Software for Small Businesses

By: Contributing Writer    4/16/2024

Time tracking is invaluable for every business's success. It ensures teams and time are well managed. While you can do manual time tracking, it's time…

Read More

How the Terraform Registry Helps DevOps Teams Increase Efficiency

By: Contributing Writer    4/16/2024

A key component to HashiCorp's Terraform infrastructure-as-code (IaC) ecosystem, the Terraform Registry made it to the news in late 2023 when changes …

Read More

Nightmares, No More: New CanineAlert Device for Service Dogs Helps Reduce PTSD for Owners, Particularly Veterans

By: Alex Passett    4/11/2024

Canine Companions, a nonprofit organization that transforms the lives of veterans (and others) suffering PTSD with vigilant service dogs, has debuted …

Read More