IT is plagued by outages, and messy environments make mitigation difficult


From the NOC to DevOps, a new study finds agreement across IT teams that outdated, messy tools are hindering service reliability, especially during the COVID-19 pandemic.

Technicians using laptop while analyzing server in server room

Image: iStock/Suwat Rujimethakul

IT professionals involved in IT Ops, networking, and engineering roles are growing increasingly frustrated by difficulties posed by IT outages, with 47% of respondents in a recent survey saying outage detection, analysis, and response are their biggest challenges.

The study from IDG and AIOps provider BigPanda found several common problems in IT Operations teams, namely an overabundance of monitoring tools, siloed management software, superfluous alerts, and patchwork incident management. 

COVID-19 has only made matters worse, the report found, with 42% saying they have had to make changes “to a great extent” to support the sudden surge of remote work. The problems exacerbated by the pandemic aren’t new, the report argues, instead they’re worsening symptoms of problems, like those mentioned above, that were just waiting for the opportune time to get worse.

“The COVID-19 pandemic has largely removed any remaining doubts: IT Ops needs to transform—and transform now,” The report said.

The current state of IT Operations

Much of the way IT Ops teams manage infrastructure, the report found, is a patchwork mess. As new systems come online, new management tools are deployed, leading to situations in which the average organization where respondents work is using 20 different monitoring tools, and 16% are using 50 or more. 

SEE: Incident response policy (TechRepublic Premium)

“The tools’ siloed, disparate nature makes the detection and analysis of issues and outages extremely difficult for IT Ops teams. Team members can often spend hours on unproductive bridge calls and forensic efforts trying to identify and resolve problems, all while expensive resources are taken offline,” the report said. 

Because of the confusion caused by siloed tools, the average respondent said it took 12 hours for their teams to determine the root cause of an issue. 

Troubleshooting is made even more difficult by the overabundance of alerts generated by all those monitoring tools, which generated more than 14,300 alerts for the average respondent organization. Sixty-five percent also said that the number of alerts has increased in the past 12 months. 

How to improve IT Ops monitoring and troubleshooting

“Disparate tools generate massive numbers of siloed alerts that must somehow be consolidated, assessed, and resolved,” the report argues. The solution to the problem argued in the report is AI Ops, which it said is necessary for IT teams “To have any hope of meeting their growing list of needs and demands.”

AI Ops is the application of artificial intelligence (AI) and machine learning to IT Operations, a solution which BigPanda provides, and describes as software that “simplifies, accelerates,

and automates many of the most onerous manual detection, investigation, and remediation functions.”

SEE: Report: SMB’s unprepared to tackle data privacy (TechRepublic Premium)

Eighty percent of survey respondents said they expect their IT Ops budget to increase in the coming year, and IT incident management automation is the most expected to grow, with 64% saying they plan to invest in that area. 

“For maxed-out IT Ops teams and their organizations, [AI Ops] can reduce operating costs, improve application performance and availability, and accelerate business velocity,” the report concluded.

Also see



Source link