An AI-First Approach to IT Operation Analytics

By Special Guest
Akhil Sahai, Ph.D., VP Product Management, Perspica
June 03, 2016

Artificial Intelligence (AI) is finally coming of age after many a false start. The days of runaway robots are still futuristic, but the time has come when the confluence of AI, Big Data and human domain knowledge is happening, with exceptional results. AI is being applied in multiple domains. IT operations is one such domain that is ripe for taking an AI-first approach.

Today’s hybrid cloud environments continue to undergo a massive transformation. These IT infrastructures are increasingly dynamic and agile but at the same time extraordinarily complex. Humans are no longer able to sift through the variety, volume and velocity of Big Data streaming out of IT infrastructures in real time, making AI—especially machine learning—a powerful and necessary tool for automating analysis and decision making. By helping teams bridge the gap between Big Data and humans, and by capturing human domain knowledge, machine learning is able to provide the necessary operational intelligence to significantly relieve this burden of near real-time, informed decision-making. Industry analysts agree. In fact, Gartner named machine learning among the top 10 strategic technologies for 2016, noting, “The explosion of data sources and complexity of information makes manual classification and analysis infeasible and uneconomical.”

In an IT operations environment, there are domain experts—typically IT administrators, IT operators for TechOps and site reliability engineers (SRE) for DevOps—who must manually gather this disparate information and apply their domain expertise in an attempt to make informed decisions. While these professionals are great at what they do, trying to analyze so much data from multiple tools leaves the door wide open for human error. On the other hand, analytics that are based on machine learning are quickly becoming a necessity to ensure the availability, reliability, performance and security of applications in today’s digital, virtualized and hybrid-cloud network environments.

Until now, IT operations teams have relied on siloed tools for monitoring that provided them with information about their network, virtual and physical infrastructure and application performance. While these tools provide pieces of the puzzle, they offer a narrow view of the IT infrastructure and, therefore, only one aspect of the toolchain. The other aspect is service desk tools that manage tickets and change management. Humans more often than not bridge this gap between the siloed monitoring tools of yesterday and service desk applications with their domain expertise.

 

The New Age of Analytics

In TechOps and DevOps environments, there is a need to automate, learn and make intelligent, informed decisions based on real-time analysis of Big Data arising out of the entire application infrastructure stack. Following are key analytics for IT operations:

  1. Topology Analysis– This is the understanding of the hierarchal, peer-to-peer and temporal relationship between hybrid cloud elements. Topology is something every IT administrator or SRE should be aware of. This type of analysis should be able to self-learn the inter-relationships of objects and the impact of their performance on one another. Learning those relationships and maintaining that understanding in order to spot trouble in time is extremely important for both TechOps and DevOps environments. 
  2. Behavior Profiling – Refers to the understanding of the behavior profile of each and every metric, how that rolls up into the object behavior and then how the object behaviors relate to other object behaviors across the hybrid cloud environment. It is a multi-dimensional problem, and understanding and adapting to “normal” behavior is extremely important.
  3. Anomaly Detection – Best-of-breed machine learning algorithms should be able to look at contextual, historical and sudden changes in the behavior of objects to detect anomalies. Understanding when there is a real anomaly and more importantly, when there is not, is critical to avoid generating false alarms. This is the bedrock of what is typically referred to as diagnostic analytics.
  4. Root Cause – By pinpointing the cause and impact of an incident, root-cause analysis is able to fast-track the resolution and reduce mean time to repair substantially.
  5.  Predictive – These analytics help operators identify early indicators and provide insights into looming problems that may eventually lead to performance degradation and outages.  Predictive analytics are also good at providing early insights into anomalies to better plan for what’s ahead.
  6. Prescriptive – These analytics provide intelligent and actionable recommendations to remediate an incident. These recommendations should capture tribal knowledge gathered over the years in the organization, best practices in the industry, and may even be crowd-sourced to capture state-of-the-art knowledge. These analytics provide the opportunity to finally close the loop in automated IT Operations Management.

Beyond Monitoring

IT operations teams have been in firefighting mode for a while now, with humans reacting to incidents as well as trying to resolve them after they have spun out of control. Instead, AI provides technologies to help automate many of these tasks in order to handle incidents in advance. The whole notion of automating IT operational tasks, as well as preventing outages in the first place, and getting to the root cause quickly and in an automated way is the next frontier in remediating these issues.

It’s become apparent that it’s no longer humanly possible to review monitoring data for the purpose of identifying incidents. In fact, AI is leapfrogging traditional monitoring solutions for DevOps and TechOps teams. They are finally beginning to understand how incredibly useful moving beyond simple monitoring tools can be, and how analytics is indispensable for real time, intelligent decision making.

Dr. Akhil Sahai is an accomplished management and technology leader with 25+ years of experience at large enterprises and at startups. Akhil came to Perspica from HP Enterprise where as Sr. Director of Product Management, he envisaged, planned and managed the Solutions Program. At Dell, as Director of Products, Akhil led Product Strategy and Management of Dell’s Converged Infrastructure product line. He also led Gale Technologies, as VP of Products to its successful acquisition by Dell. Prior to that, at Cisco he undertook business development for VCE Coalition, and at VMware, he managed global product strategy and management for vCloud Software with focus on applications, and Virtual Appliances product line. He has published 80+ peer-reviewed articles, authored a book, edited another, and chaired multiple International IEEE/IFIP Conferences. He has filed 20 technology Patents (with 16 granted). He has a Ph.D. from INRIA France and an MBA from Wharton School.




Edited by Stefania Viscusi
SHARE THIS ARTICLE
Related Articles

Pai Makes His Case for Title II Repeal

By: Paula Bernier    11/21/2017

FCC Chairman Ajit Pai today made clear his plans to repeal Title II net neutrality rules. The commission is expected to pass his proposal at its Dec. …

Read More

Mist Applies AI to Improve Wi-Fi

By: Paula Bernier    11/9/2017

Mist has created an AI-driven wireless platform that puts the user and his or mobile device at the heart of the wireless network. Combining machine le…

Read More

International Tech Innovation Growing, Says Consumer Technology Association

By: Doug Mohney    11/8/2017

The Consumer Technology Association (CTA) is best known for the world's largest trade event, but the organization's reach is growing far beyond the CE…

Read More

Broadcom Makes Unsolicited $130B Bid for Qualcomm

By: Paula Bernier    11/6/2017

In what could result in the biggest tech deal in history, semiconductor company Broadcom has made an offer to buy Qualcomm for a whopping $130 billion…

Read More

How Google's 'Moonshot' Could Benefit Industrial Markets

By: Kayla Matthews    10/30/2017

The term "moonshot" encapsulates the spirit of technological achievement: an accomplishment so ambitious, so improbable, that it's equivalent to sendi…

Read More