Machine Learning 101 For Advanced Network Monitoring Northrop

  • Slides: 25
Download presentation
Machine Learning 101: For Advanced Network Monitoring Northrop Grumman Technology Services (NGTS) October 13,

Machine Learning 101: For Advanced Network Monitoring Northrop Grumman Technology Services (NGTS) October 13, 2016 Calvin Smith, Technologist/Strategist

Northrop Grumman Technology Services Snapshot At a Glance Focus Areas • $4. 3 B

Northrop Grumman Technology Services Snapshot At a Glance Focus Areas • $4. 3 B Annual Revenue • Logistics and modernization • 14, 000 employees • Mission readiness and training • 50 states, 16 countries • Land forces sustainment and modernization • Systems security and operations • Health technology • Fraud detection and compliance • Data analysis and decision support tools • Network operations and management • Cyber • Integrated air and missile defense • Postal systems and services • Command control 2 Approved for Public Release #15 -0906; Unlimited Distribution Approved for Public Release, #15 -0507; Unlimited Distribution

About Me The End-to-End Monitoring team supports federal, state and local government programs, specializing

About Me The End-to-End Monitoring team supports federal, state and local government programs, specializing in cyber and performance monitoring. v Cal - 28+ years in networking & cyber v Currently supports US CERT as Cyber Technologist and Strategist for tech roadmap and business transformation v Currently supports Texas State Agencies as Cyber and Network Solutions Architect v Previously worked for U. S. Department of State, Department of Homeland Security, Department of Justice and Patent Trademark Office as Cyber Solutions Architect v In spare time he is an avid Texas music collector, all things Cloud and Road Warrior 3

Current IT Customer Challenges Large Complex Landscapes • Limited end-to-end monitoring, nominal visibility •

Current IT Customer Challenges Large Complex Landscapes • Limited end-to-end monitoring, nominal visibility • Multi-domain networks, systems, applications, devices • Multiple fiefdoms, regions, sites, offices, enclaves, mobile • Big data, little usable information, lost awareness • Many end-users, little or no user behavior insight Data Difficulties • Many data sources, highly complex, cluttered environments • Data silos, hidden, largely under-utilized, not data-driven • Not multidimensional, predictive, prescriptive or real-time Business Impact • Reactive event tracking, incident and problem management • Frequent disruption of critical service delivery to customers • Lost data interaction, lost value to business • Data not available for use by business decision makers Network Complexity Data Complexity Business Services

IT Solution Challenges • Implement data analytics to alert when anomalous behavior is out

IT Solution Challenges • Implement data analytics to alert when anomalous behavior is out of bounds Alert • Automate data aggregation and event correlation using advanced data and visual analytics Data Sources • Interoperate with existing legacy tools and sensors end-to-end across network infrastructure Interoperate • Generate accurate and reliable predict capabilities using machinelearning and dynamic trending Machine Learning • Generate interactive visual analytics to display contextual geographic or organizational maps Visualization • Implement KPI and health-scores based on immediate context Actionable Insight 5

Concept of Operations Actionable IT Operations Intel & Business Insight • Integrate a big

Concept of Operations Actionable IT Operations Intel & Business Insight • Integrate a big data management platform as a enterprise central event and syslog aggregation point Big Data Platform • Unify, integrate and interoperate with a machine data-driven focus Data Fusion • Real-time dashboards for business users and decisionmakers throughout the organization Dashboards • Unified reporting for decision support for proactive monitoring, operational intelligence and business insight Collaborative Environment

What is Big Data: “ 3 Vs” 1 7 2 3

What is Big Data: “ 3 Vs” 1 7 2 3

Data Science Approach Machine Learning is a component of an overall data science approach

Data Science Approach Machine Learning is a component of an overall data science approach Data Transformation 8

What is Machine Learning: Pattern Recognition • Machine learning is a method of data

What is Machine Learning: Pattern Recognition • Machine learning is a method of data analysis • Automates building analytical data models Data Models & Analysis • Uses algorithms that iteratively learn from data • Allows computers to find hidden insights without being explicitly programmed where to look • Very similar to concepts for Artificial Intelligence, Expert System, Neural Network, Complex Event Processing and Deep Learning • Many use cases for data mining, pattern classification and pattern matching • Big use cases today: self-driving cars, fraud detection, twitter trending 9 Data Learning Hidden Insights Pattern Matching Autonomous Systems & Self Learning

Machine Learning is Smart Data 10

Machine Learning is Smart Data 10

Machine Learning for Advanced Network Monitoring • Develop machine learning techniques for characterizing normal

Machine Learning for Advanced Network Monitoring • Develop machine learning techniques for characterizing normal network behavior and detecting anomalies Data Mining Pattern ID • Use machine learning rather than rule-based methods to be more proactive, actionable and insightful (predictive) • Data analysis is performed in data streams at scale for realtime results • Machine learning helps customers detect anomalies by providing greater visibility into the normal operation of their networks • It provides real-time data-driven reporting and alerting on violations of those norms Analytics Data Streams E 2 E Monitoring Anomaly Detection 11

Machine Learning Methodology • Log network, application, resource performance and utilization traffic flows Log

Machine Learning Methodology • Log network, application, resource performance and utilization traffic flows Log • Build a predictive model based on past values Model • Refine until predictions are accurate (match real-world) Test • Forecast anomalous conditions or demand Forecast • Act / Mitigate Act 12

Key Ninja Capabilities Predictive Analytics: • Combination of real time metrics, time, historical baselines,

Key Ninja Capabilities Predictive Analytics: • Combination of real time metrics, time, historical baselines, local and seasonal level values, and different interval parameters 24 x 7 minute per minute uniquely builds predictive models for metrics Visual Analytics: • Dynamic Color Coding– Display schema for green, yellow and red (stoplight colors) tied directly to predictive analytics • Acceptable Performance Range– A dynamic color display map based on predictive analytics minute per minute • Enhanced Mapping – Using real time data paired with predictive analytics to display geo-location with critical metrics • KPI with Trending – A real time metric paired with dynamic color map comparing the current bucket of time to the previous per metric

A Typical End-to-End Network Environment

A Typical End-to-End Network Environment

Use Case: Network Performance Monitoring KPI w/ Trending, Enhanced Mapping, Dynamic Color Coding Note:

Use Case: Network Performance Monitoring KPI w/ Trending, Enhanced Mapping, Dynamic Color Coding Note: Information presented here is representative only Application Web Portal #1 Web Portal #2 15

Use Case: Advanced Network Monitoring Region, City, Location, Device Note: Information presented here is

Use Case: Advanced Network Monitoring Region, City, Location, Device Note: Information presented here is representative only 16

Use Case: Acceptable Performance Range Note: Information presented here is representative only • Based

Use Case: Acceptable Performance Range Note: Information presented here is representative only • Based on predictive analytics minute per minute • No static or predefined thresholds • A secondary metric can be added for context 17 Primary Metric Baseline Primary Metric APR Secondary Metric

Use Case: Data Center Network Cyber Monitoring Interactive drill-down to stacks, racks and virtual

Use Case: Data Center Network Cyber Monitoring Interactive drill-down to stacks, racks and virtual machines “Advanced visual analytics allowing customers to interact with their data for better situational awareness”

Use Case: Instrumented Data Center for Continuous System Diagnostics “Ability to dynamically investigate, correlate

Use Case: Instrumented Data Center for Continuous System Diagnostics “Ability to dynamically investigate, correlate and mitigate data center assets, applications and infrastructure issues”

Use Case: Dynamic Data Center Risk Management “Correlate threat and risk information allowing customers

Use Case: Dynamic Data Center Risk Management “Correlate threat and risk information allowing customers to quickly respond, repair and report before compromise”

Use Case: Real-time Data Center Cyber Posture “Real-time cyber threat analytics reflecting dynamic cyber

Use Case: Real-time Data Center Cyber Posture “Real-time cyber threat analytics reflecting dynamic cyber posture and the ever-changing threat landscape”

Use Case: Dynamic KPI, Summary KPI and Health Score Tracking & Trending Executive Dashboard

Use Case: Dynamic KPI, Summary KPI and Health Score Tracking & Trending Executive Dashboard (Health score Metrics) - Alerts - Availability - Health/Performance - Volume - … Performance (Summary KPIs) - Alerts Availability Health/Performance, ART Volume … Cyber (Summary KPIs) - Malware Alerts Threats User Behavior HW/SW, Vulns, Configurations … Application (KPIs) - # Responsive Applications Dependency Tracking (MQ) Transactions (Traffic) Total Transaction Errors … Network (Summary KPIs) - Alerts Availability Health/Performance, ART Volume … Hardware & OS (KPIs) - Uptime (Availability) - CPU Load - Disk Used / Total - Memory Used / Total -… Systems/Facilities (Summary KPIs) - Alerts Availability Health/Performance Volume … Tickets - Tickets opened/closed - Root Cause Breakdown Type of Ticket Report Times …

Summary • Increasing number of customers want end-to-end real-time visibility into their networks and

Summary • Increasing number of customers want end-to-end real-time visibility into their networks and business processes • Leverage existing investments and data you have already have to answer new business questions • Enrich network machine data with operations data to deliver business context • Design for continuous insights 23

Q&A Questions? 24

Q&A Questions? 24