Operational Excellence in IT Service Management Mehmet zgr

  • Slides: 27
Download presentation
Operational Excellence in IT Service Management Mehmet Özgür Depren Technical Sales Manager - IBM

Operational Excellence in IT Service Management Mehmet Özgür Depren Technical Sales Manager - IBM Middleware

The Next IT Operations Focus: Big Data “Focus on operational objectives has seen significant

The Next IT Operations Focus: Big Data “Focus on operational objectives has seen significant uptick since 2013”

IBM Continues to Invest Heavily in Analytics ü More than $17 B in Acquisitions

IBM Continues to Invest Heavily in Analytics ü More than $17 B in Acquisitions Since 2005; more than any other company 2015 ü Most comprehensive portfolio, from business to IT Analytics, while most other vendors offer only point solutions Social Analytics/Consumer Insight ü C&SI’s suite of analytics products leverage best of breed capabilities from across all of IBM’s portfolio Workload Optimized Systems Advanced Case Management Content Analytics Decision Management Stream Computing Pervasive Content pure. Scale pure. XML Deep Compression Developer Productivity Autonomic Operations 2005

IT Operations Analytics Solves New Challenges Reducing & Preventing Outages and Slowdowns for the

IT Operations Analytics Solves New Challenges Reducing & Preventing Outages and Slowdowns for the 24/7 Application World The Network End users Devices Web Servers Databases App Servers IT Operations Analytics can help 1 Never set performance threshold manually again 2 Identify potential issues before customers are impacted 3 Isolate the problem through analysis of all your IT data

Understanding IBM Operations Analytics Business Outcome Proactive Outage Avoidance Faster Problem Resolution Optimized Performance

Understanding IBM Operations Analytics Business Outcome Proactive Outage Avoidance Faster Problem Resolution Optimized Performance Predict Search Optimize Predict problems before they occur Search quickly across massive amounts of data Optimize across your IT app infrastructure Capabilities Operations Analytics IBM Big Data Platform Streams IBM or 3 rd Party Solutions Operational Environment Application Performance SPSS Cloud Insights Info. Sphere Big. Insights Rave Watson Documentation System & Log Monitoring Transactions Assets & Workorders Alerts, Alarms & Events Applications | Systems | Workloads | Wireless | Network | Voice | Security | Mainframe | Storage | Assets

IBM Solution for IT Operations Analytics Our Capabilities Predict problems before they become service

IBM Solution for IT Operations Analytics Our Capabilities Predict problems before they become service impacting Search Diagnose application & infrastructure issues using all your operational data Optimize Ensure your IT infrastructure is operating as efficiently as possible environments Why IBM? 60% Faster creation of custom high impact mobile ready operations dashboards 50% Faster application diagnostics Analytics Avoid Outages While Reducing Threshold Management Costs Consolidated Communications detects 100 percent of their major incidents, including silent failures, and eliminated the human intensive task of managing manual thresholds, saving $300, 000 annually Resolve Problems Faster Barclay’s Bank was able to search and diagnose problems 60% faster to quickly resolve application and infrastructure issues. In addition, they identified customer patterns from log data and applied this to channel intelligence 30% Improve Operational Efficiency Advanced events analytics has allowed Claranet to reduce the number of trouble tickets and focus more time and resources on what truly matters to their customers. Reduction in operator event load 20% Reduction in storage requirements over competitive offerings #1 Leadership position in Operations Management solutions

IBM Operations Analytics – Predictive Insights Predict Challenge: Reacting to performance thresholds is not

IBM Operations Analytics – Predictive Insights Predict Challenge: Reacting to performance thresholds is not enough. IT Staffs must become proactive to ensure mission critical apps never go down. Automated Threshold Maintenance No complex manual intervention to setup & maintain with 5 times faster processing Anomaly Detection Alerting before potential issues become service impacting, enabling IT to shift from reactive to proactive On-Prem and Saa. S Predictive Insights now available as a Service, providing additional value to our Performance Management solutions Supports Heterogeneous Environments Out-of-the-box integrations to IBM APM/ITM or 3 rd-party monitoring solutions

Why aren’t operations teams proactive today? § § § Too much data to analyze

Why aren’t operations teams proactive today? § § § Too much data to analyze manually Existing analytic techniques, such as standard thresholds, are not up to the task They cannot detect problems while they are emerging (before business impact) Set performance threshold too high, insufficient warning before total failure. Set performance threshold too low, too much noise, everything is ignored If no there is no ‘early detection’ before the outage, operations teams can only react while outage is already in effect and already losing money. . .

Learn relationships between metrics without static thresholds • Predicative Insights learns the normal historical

Learn relationships between metrics without static thresholds • Predicative Insights learns the normal historical range • It will alarm if it falls outside this range Watson DNA inside 9

European Telco – Flatline Stopped (crashed) Application - Regular load absent. Targeting Situation Detections

European Telco – Flatline Stopped (crashed) Application - Regular load absent. Targeting Situation Detections Customer Relationship Management System for large Telco. 100 applications monitored by Compuware System. (40 million metrics) In this Example the regular load on one of the servers has changed indicating application problem.

European Gambling Website – Adaptive Threshold High disk latency Automated Dynamic Thresholds and Early

European Gambling Website – Adaptive Threshold High disk latency Automated Dynamic Thresholds and Early Detection A gambling Website application monitored by HP. Coming up to busy sporting event traffic increased causing stress on the system and negative customer experience. Using PI early detection of latency issue could have been tackled to avoid this.

Large US Bank– Adaptive Threshold Connection Leak Automated Dynamic Thresholds and Early Detection These

Large US Bank– Adaptive Threshold Connection Leak Automated Dynamic Thresholds and Early Detection These are Websphere metrics taken from CAWily performance management system. . The number of actual connections to the Web. Sphere application server has increased dramatically. The poolsize and bytes. In. Use are also affected indicating either increased demand, or a problem with connections not being freed up. Insight Poolsize and Bytesinuse on the same node are also behaving anomalous at the same time and are related to each other.

European Bank – Significant trend. Disk Thrashing Targeting Situation Detections File server under stress

European Bank – Significant trend. Disk Thrashing Targeting Situation Detections File server under stress as file control operations and bytes per second increase. This sudden change can be tracked back to a patch applied.

A Sample of technologies Predictive Insights integrates with IBM ITM/TDD & IBM APM IBM

A Sample of technologies Predictive Insights integrates with IBM ITM/TDD & IBM APM IBM OMEGAMON HP BAC, Topaz IBM TNPM Aircom Optima

Predictive Insights as a Service Performance Management + Predictive Insights ü Integrated threshold automation

Predictive Insights as a Service Performance Management + Predictive Insights ü Integrated threshold automation and maintenance ü Anomaly detection ü Get ahead of potential application and resource outages ü Learn, Explore, and Try ü Continuous Delivery

IBMPredict Operations Analytics – Log Analysis Challenge: To diagnose service problems in applications and

IBMPredict Operations Analytics – Log Analysis Challenge: To diagnose service problems in applications and the infrastructure supporting them involves quickly analyzing incredible amounts of both structured and unstructured data Breadth of Searchable Data Search across all of your IT operational data to quickly resolve issues Expert Advice Any competitor can isolate problems. IBM helps clients quickly resolve them. Mainframe Support Search System z (z. Linux & z. OS) logs in addition to all your other data Embedded Analytics Out-of-the-box integrations to IBM APM/ITM or 3 rd-party monitoring solutions Search

Search IBM Operations Analytics – Log Analysis Collects large volumes of structured and semi-structured

Search IBM Operations Analytics – Log Analysis Collects large volumes of structured and semi-structured data and transforms it through analytics into actionable intelligence. Search and Visualize Insight Packs IT Operations App Support Service Desk Normalize Consolidate Documentation Logs Metrics Events Collect

Application owner : I got a trouble ticket on my application. I want to

Application owner : I got a trouble ticket on my application. I want to quickly find the root cause and fix it and restore app/service ASAP Current Challenge : large volume of data to collect and analyze , manual correlation taking days/hours to find the root cause of the problem. Cannot find logs for problem window situations. Highly dependent on SME skills. Its an art Core files Logs, Traces, . . Events Metrics Transactions Config 0100011100001110 011000111110000110001 11111100011100011 [10/9/12 5: 51: 38: 295 GMT+05: 30] 0000006 a servlet E com. ibm. ws. webcontainer. serv let. Servlet. Wrapper service SRVE 0068 E:

Application owner : I got a trouble ticket on my app. I want to

Application owner : I got a trouble ticket on my app. I want to quickly find the root cause, fix it and restore service ASAP Solution: IBM Operations Analytics – Log Analysis can provide insights from all data in clicks. App owner can search through the data, leverage Dashboards to find the root cause in minutes IBM Operations Analytics Log Analysis metrics Expert knowledge Events Tickets [10/9/12 5: 51: 38: 295 GMT+05: 30] 0000006 a servlet E com. ibm. ws. webcontainer. servlet. Servlet. W rapper service SRVE 0068 E: Uncaught exception created in one of the service methods of the servlet Trade. App. Servlet in application Day. Trader 2 -EE 5. Exception created : javax. servlet. Servlet. Exception: logs Tx# date status 108978 23 -Jul-2013 started 108978 23 -Jul-2013 To IN Transaction details from App DB

Out of the Box Insight Packs • Out of the Box Insight Packs (IBM

Out of the Box Insight Packs • Out of the Box Insight Packs (IBM Provided) • • • IBM Websphere Application Server IBM DB 2 Web Access Logs Windows Events Sys. Log Java Core IBM MQ Series IBM Integration Bus (Message Broker) Delimiter Separated Value (DSV) log files • Partner Provided – • Microsoft Sharepoint, Microsoft Exchange, Microsoft SQL Server, Microsoft Active Directory • Tivoli Storage Manager • IBM Systems Disk Storage 8000 • IBM AIX Errpt • IBM HTTP Server • HP Live. Site , HP Team. Site • Oracle Database • VM Ware ESXi • Oracle Siebel https: //developer. ibm. com/itoa/

IBM Netcool Operations Insight Modern Dashboards, Fully Mobile Visualize the performance and health of

IBM Netcool Operations Insight Modern Dashboards, Fully Mobile Visualize the performance and health of your entire operations environment. Out of the box Integration • 98% Reduction in Critical events: ~22 critical & ~100 major events per week • Improved focus and utilization of first- and second-line staff Analytics to increase event value v 1. 1 30% reduction in Events to Operations v 1. 2 Almost 50% reduction in repeating events v 1. 3 90% reduction for known event classes Optimize

Event Analytics – Seasonal Event Identification Improve efficiency by identifying and resolving recurring problems

Event Analytics – Seasonal Event Identification Improve efficiency by identifying and resolving recurring problems Large Bank 7% of Priority 1 Tickets were raised by events that were highly seasonal 30% of lower severity tickets n Report on event history identifies seasonal events sorted by confidence level and frequency n Drill down shows time distributions of events …investigate peaks. n Can better align thresholds to seasonal peaks reducing events

Seasonality Analysis of events 1 MS SCOM Health Service Heartbeat failures happen often on

Seasonality Analysis of events 1 MS SCOM Health Service Heartbeat failures happen often on Sunday 06. 00 am, probably due to regular maintenance 2 A specific Oracle database is not accessible every day at 21. 00 pm, probably due to a daily restart or backup 3 A node is giving file system alerts every day around 01. 00 am, probably due to a daily batch job

Related Events Grouping Relationships I know about Known Event Analysis Grouping and Correlation providing

Related Events Grouping Relationships I know about Known Event Analysis Grouping and Correlation providing powerful situation management of active events • • Out of the box domain expertise for known event relationships Vendor and technology dependent Significant reduction of incidents presented to the operator Extendable by Business Partners and clients with no coding required

Event Analytics –Related Event Analytics Relationships I don’t know about Improve efficiency - Reduce

Event Analytics –Related Event Analytics Relationships I don’t know about Improve efficiency - Reduce actionable events by grouping events that always occur together Automatic detection of event clusters Leverages machine learning to analyze historical event archive and identify groups of events that always occur together • • • Presents identified relationship to the Administrator Presents proposed automated actions • Watch, Deploy, Archive or Do nothing Groups events in the Event Viewer “It is very beneficial to have a tool that can turn historical event data into an event group with a single root event. It helps us turn the data into logic” Increase operator efficiency by up to 90% with out-of-the-box alert reduction and advanced alert analytics

Future of Service Management Visibility Control Automation Real-time Analytics and Visualization Problem Isolation Data

Future of Service Management Visibility Control Automation Real-time Analytics and Visualization Problem Isolation Data Correlation Outage avoidance Integration Optimization Insight & Care Predictive Analytics

Thank You

Thank You