In this special guest feature, Gerry Miller, CEO at Cloudticity, takes a look at the emerging technology dubbed “AIOps.” AIOps, according to Gartner, “combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” The basic operating model for AIOps is Observe-Engage-Act. Gerry founded Cloudticity in 2011 with a passion for helping healthcare organizations radically reshape the industry by unlocking the full potential of cloud technology.
Artificial intelligence (AI) is particularly attractive in cloud-powered managed services for IT because it’s so adept at finding patterns hidden in mountains of data: Spotting one server out of hundreds that hits 95% utilization at 3am every weekday night (but not on the weekends). Identifying weirdly inconsistent traffic patterns involving a suspicious IP address. Flagging applications trying to access information that makes no sense, such as an accounting package trying to download binaries from a development server. Such massive data processing and pattern identification prowess feed automation processes that are enormously helpful to humans in IT struggling to optimize workflow, prevent service outages, and maintain security.
Over the past few years, these capabilities have led to an emerging technology dubbed “AIOps.” AIOps, according to Gartner, “combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” The basic operating model for AIOps is Observe-Engage-Act:
- Observe: This part of AIOps is where data ingestion and monitoring of events, metrics, traces, and topologies takes place. Historical analysis is also employed on data stored in data lakes to produce and tune analytical models for anomaly detection, performance inspection, and correlation mapping.
- Engage: This phase is where Information Technology Service Management (ITSM) activities interface with incidents, dependencies, and changes within an organization’s IT delivery, operations, and control infrastructure including task automation triggers, risk analysis, and knowledge management engines.
- Act: This phase is where automation “does the work” to remediate issues via scripts, adaptive resource allocation (ARA), or runbooks on predefined self-healing and maintenance procedures.
The AIOps model is cyclical and continuous, with each phase in the loop designed to feed and refine the operation of the others. While the overall workflow may be complex, combining AI and operational intelligence to automate remediation whenever possible can dramatically decrease workload for IT operations staff, as well as simplify recommended actions on the remaining fraction of process issues that cannot be resolved automatically.
The reality behind the hype
While virtually every managed service provider claims to use AI, the devil is in the details.
Let’s suppose you build a platform that uses a third-party malware-detection application with AI inside. Technically, you can say that your platform uses AI — but it’s really AI-by-proxy. To reap the full potential of AI for operations, you need customized models to drive machine learning. Slapping an “AI” moniker on a service doesn’t automatically make it better or even useful, and there’s still a lot of confusion about AIOps capability, tooling, and implementation.
It is incumbent on reputable managed service providers to bridge that gap for their clients.
But when done correctly, a true managed service AIOps implementation can deliver some powerful benefits, including:
- Increased Agility: AIOps enables the automation of previously manual monitoring and remediation chores. This frees IT staff from burdensome and repetitive tasks such as tracking CPU utilization, monitoring backups, or managing firewalls. And since machine learning continually provides insights to drive automated configuration changes and issue management, it affords unprecedented flexibility for adapting to changing requirements with minimal human intervention.
- True Reliability: The dream of 100% availability can be achieved with AIOps’ predictive modeling and continuous monitoring for component failures or usage surges. If a failure or surge does occur, auto-remediation ensures that the system continues to operate without perceptible degradation in performance.
- Enhanced Security and Compliance: AIOps can use machine learning to identify drifts in configuration and deviations from baseline performance that represent potential compliance or security violations. And machine-learning powered automations can be tooled to accurately keep the system continually in compliance.
The power of the cloud has been both a blessing and a curse to IT efficiency. It increases the ability to leverage enormous volumes of data and affords amazing access to scalable computational capacity. But it has also overwhelmed many IT systems and the professionals who manage them with tsunamis of information, ever increasing complexity, and escalating risk.
AIOps really can help balance the scales. Correctly delivered within managed services and integrated for organizational requirements, it’s one technology that delivers on its promise.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1