An AI-First Approach to IT Operation Analytics

By Special Guest
Akhil Sahai, Ph.D., VP Product Management, Perspica
June 03, 2016

Artificial Intelligence (AI) is finally coming of age after many a false start. The days of runaway robots are still futuristic, but the time has come when the confluence of AI, Big Data and human domain knowledge is happening, with exceptional results. AI is being applied in multiple domains. IT operations is one such domain that is ripe for taking an AI-first approach.

Today’s hybrid cloud environments continue to undergo a massive transformation. These IT infrastructures are increasingly dynamic and agile but at the same time extraordinarily complex. Humans are no longer able to sift through the variety, volume and velocity of Big Data streaming out of IT infrastructures in real time, making AI—especially machine learning—a powerful and necessary tool for automating analysis and decision making. By helping teams bridge the gap between Big Data and humans, and by capturing human domain knowledge, machine learning is able to provide the necessary operational intelligence to significantly relieve this burden of near real-time, informed decision-making. Industry analysts agree. In fact, Gartner named machine learning among the top 10 strategic technologies for 2016, noting, “The explosion of data sources and complexity of information makes manual classification and analysis infeasible and uneconomical.”

In an IT operations environment, there are domain experts—typically IT administrators, IT operators for TechOps and site reliability engineers (SRE) for DevOps—who must manually gather this disparate information and apply their domain expertise in an attempt to make informed decisions. While these professionals are great at what they do, trying to analyze so much data from multiple tools leaves the door wide open for human error. On the other hand, analytics that are based on machine learning are quickly becoming a necessity to ensure the availability, reliability, performance and security of applications in today’s digital, virtualized and hybrid-cloud network environments.

Until now, IT operations teams have relied on siloed tools for monitoring that provided them with information about their network, virtual and physical infrastructure and application performance. While these tools provide pieces of the puzzle, they offer a narrow view of the IT infrastructure and, therefore, only one aspect of the toolchain. The other aspect is service desk tools that manage tickets and change management. Humans more often than not bridge this gap between the siloed monitoring tools of yesterday and service desk applications with their domain expertise.

 

The New Age of Analytics

In TechOps and DevOps environments, there is a need to automate, learn and make intelligent, informed decisions based on real-time analysis of Big Data arising out of the entire application infrastructure stack. Following are key analytics for IT operations:

  1. Topology Analysis– This is the understanding of the hierarchal, peer-to-peer and temporal relationship between hybrid cloud elements. Topology is something every IT administrator or SRE should be aware of. This type of analysis should be able to self-learn the inter-relationships of objects and the impact of their performance on one another. Learning those relationships and maintaining that understanding in order to spot trouble in time is extremely important for both TechOps and DevOps environments. 
  2. Behavior Profiling – Refers to the understanding of the behavior profile of each and every metric, how that rolls up into the object behavior and then how the object behaviors relate to other object behaviors across the hybrid cloud environment. It is a multi-dimensional problem, and understanding and adapting to “normal” behavior is extremely important.
  3. Anomaly Detection – Best-of-breed machine learning algorithms should be able to look at contextual, historical and sudden changes in the behavior of objects to detect anomalies. Understanding when there is a real anomaly and more importantly, when there is not, is critical to avoid generating false alarms. This is the bedrock of what is typically referred to as diagnostic analytics.
  4. Root Cause – By pinpointing the cause and impact of an incident, root-cause analysis is able to fast-track the resolution and reduce mean time to repair substantially.
  5.  Predictive – These analytics help operators identify early indicators and provide insights into looming problems that may eventually lead to performance degradation and outages.  Predictive analytics are also good at providing early insights into anomalies to better plan for what’s ahead.
  6. Prescriptive – These analytics provide intelligent and actionable recommendations to remediate an incident. These recommendations should capture tribal knowledge gathered over the years in the organization, best practices in the industry, and may even be crowd-sourced to capture state-of-the-art knowledge. These analytics provide the opportunity to finally close the loop in automated IT Operations Management.

Beyond Monitoring

IT operations teams have been in firefighting mode for a while now, with humans reacting to incidents as well as trying to resolve them after they have spun out of control. Instead, AI provides technologies to help automate many of these tasks in order to handle incidents in advance. The whole notion of automating IT operational tasks, as well as preventing outages in the first place, and getting to the root cause quickly and in an automated way is the next frontier in remediating these issues.

It’s become apparent that it’s no longer humanly possible to review monitoring data for the purpose of identifying incidents. In fact, AI is leapfrogging traditional monitoring solutions for DevOps and TechOps teams. They are finally beginning to understand how incredibly useful moving beyond simple monitoring tools can be, and how analytics is indispensable for real time, intelligent decision making.

Dr. Akhil Sahai is an accomplished management and technology leader with 25+ years of experience at large enterprises and at startups. Akhil came to Perspica from HP Enterprise where as Sr. Director of Product Management, he envisaged, planned and managed the Solutions Program. At Dell, as Director of Products, Akhil led Product Strategy and Management of Dell’s Converged Infrastructure product line. He also led Gale Technologies, as VP of Products to its successful acquisition by Dell. Prior to that, at Cisco he undertook business development for VCE Coalition, and at VMware, he managed global product strategy and management for vCloud Software with focus on applications, and Virtual Appliances product line. He has published 80+ peer-reviewed articles, authored a book, edited another, and chaired multiple International IEEE/IFIP Conferences. He has filed 20 technology Patents (with 16 granted). He has a Ph.D. from INRIA France and an MBA from Wharton School.




Edited by Stefania Viscusi
SHARE THIS ARTICLE
Related Articles

Compliance: Hope Is Not a Plan

By: Special Guest    8/1/2018

Internal misalignment between compliance and business teams can lead to major problems for organizations seeking to implement new digital communicatio…

Read More

Modern Moms Shaping Influence

By: Maurice Nagle    7/19/2018

Everyone knows Mom knows best. The internet is enabling a new era in sharing, and sparking a more enlightened, communal shopping experience. Mommy blo…

Read More

Why People Don't Update Their Computers

By: Special Guest    7/13/2018

When the WannaCry ransomware attacked companies all over the world in 2017, experts soon realized it was meant to be stopped by regular updating. Even…

Read More

More Intelligence About The New Intelligence

By: Rich Tehrani    7/9/2018

TMC recently announced the launch of three new artificial intelligence events under the banner of The New Intelligence. I recently spoke with TMC's Ex…

Read More

Technology, Innovation, and Compliance: How Businesses Approach the Digital Age

By: Special Guest    6/29/2018

Organizations must align internally to achieve effective innovation. Companies should consider creating cross-functional teams or, at a minimum, incre…

Read More