Felix Ehm MONITORING AND DIAGNOSTIC OF MIDDLEWARE SERVICES

  • Slides: 23
Download presentation
Felix Ehm MONITORING AND DIAGNOSTIC OF MIDDLEWARE SERVICES 28 th June 2012

Felix Ehm MONITORING AND DIAGNOSTIC OF MIDDLEWARE SERVICES 28 th June 2012

Introduction Problem: how can I detect a failure in my systems ? What is

Introduction Problem: how can I detect a failure in my systems ? What is the reason? Host, Network ? � Add machine monitoring Is my program running correctly ? � ? 28 th June 2012 2

Introduction Problem: how can I detect a failure in my systems ? Gain control

Introduction Problem: how can I detect a failure in my systems ? Gain control by exposing process internal information to enable constant monitoring for pre-failure recognition. JMX for Java Processes � CMWAdmin for CMW servers � CMX for C/C++ general services � Tracing/Central Logging System � 28 th June 2012 Gina Gorgogianni, CMX Feedback 3

The Java Management Extension Java Standard to expose process internal information � Inspect data

The Java Management Extension Java Standard to expose process internal information � Inspect data (remotely) via JConsole/Jvisualvm � Many monitoring systems support this � Example for JMS Broker 28 th June 2012 4

The CMWAdmin GUI Java GUI to inspect CMW-enabled process � Browse and watch information

The CMWAdmin GUI Java GUI to inspect CMW-enabled process � Browse and watch information from one server � Uses CMW middleware to access data � CMW Servers list from the Directory Server CMWAdmin 28 th June 2012 5

The CMX Library A general solution to allow exposure of internal metrics for C/C++

The CMX Library A general solution to allow exposure of internal metrics for C/C++ programs. Idea origins from JMX: Why can’t we have something like this for C/C++? • Requirements • • Small memory footprint • Non-blocking calls • Metrics: floats & strings • Project started in 2012 28 th June 2012 6

Architecture � High Level � 2 lightweight APIs with non-blocking operations to ○ Update

Architecture � High Level � 2 lightweight APIs with non-blocking operations to ○ Update : registers, exposes & updates metrics ○ Read : retrieves information for metrics / process �No dependencies � Low Level �Main Segment: table containing the registered processes �Process Segments: structures containing information on metrics 28 th June 2012 Shared Memory 9

CMX Library Characteristics Very small footprint: 140 KB in memory usage � Easy non-blocking

CMX Library Characteristics Very small footprint: 140 KB in memory usage � Easy non-blocking API: 10 core functions in total � Supports floats and string data types � Incorporated input from real-time experts � CMX Library is ready for preproduction � No dependencies on external libraries � Future: Deployment for all cmw servers � �But also applicable for other C/C++ projects 28 th June 2012 11

Constant Monitoring Host Health � Process up/down � Process service endpoint ok? � �E.

Constant Monitoring Host Health � Process up/down � Process service endpoint ok? � �E. g. HTTP Server : is wget successful ? � Process does what it is supposed to do 28 th June 2012 12

Constant Monitoring � DIAMON as CO in-house solution �Reads metrics and applies rules �Easy

Constant Monitoring � DIAMON as CO in-house solution �Reads metrics and applies rules �Easy to extend though pluggable architecture �Provides history of metrics �Provides replay functionality � Controls config DIAMON In case of problem detection �Displays it to Operators �Sends notification via SMS/Mail 28 th June 2012 DAQs JMX CMW CMX 13

The DIAMON Synoptic Viewer 28 th June 2012 14

The DIAMON Synoptic Viewer 28 th June 2012 14

The DIAMON Console 28 th June 2012 15

The DIAMON Console 28 th June 2012 15

Diamon � View History Data on metrics 28 th June 2012 16

Diamon � View History Data on metrics 28 th June 2012 16

The Central Tracing/Logging System I need more information than just numbers to diagnose a

The Central Tracing/Logging System I need more information than just numbers to diagnose a problem! � Log events are helpful � Find the point where the program crashes/fails � Access to (past) events is required � Problems � Frontends are diskless � Multi-layer systems implies watching many sources at the same time � You get quickly drawn in the amount of information � CMW Project was initiated June 2011 � Target: Collect log events from CMW servers for better diagnostic (n. b. log events = info, debug, error, warning, etc. ) � Replace previous system 28 th June 2012 17

The Central Tracing/Logging System Finding/Debugging a problem becomes cumbersome! Collecting and unifying tracing messages

The Central Tracing/Logging System Finding/Debugging a problem becomes cumbersome! Collecting and unifying tracing messages in one central place Easy correlation of events among many services Tracing Server ? DB Equipment Specialist / Developer 28 th June 2012 18

The Tracing/Log GUI Record to File Filter Finding/Debugging a problem becomes cumbersome! Collecting and

The Tracing/Log GUI Record to File Filter Finding/Debugging a problem becomes cumbersome! Collecting and unifying tracing messages in one central place Easy correlation of events among many services Tracing Server Avail. Log Instances Incoming log events ? DB Message Panel

The CMW Tracing Package � C++ client library �Very lightweight �Supports TCP + UDP

The CMW Tracing Package � C++ client library �Very lightweight �Supports TCP + UDP �File + syslog + STOMP appender �Integrated with CMW components �Log level can be changed during runtime � JAVA client library �Based on log 4 j �Very easy to integrate with existing JAVA services 28 th June 2012 20

The CMW Tracing Package � The Server �Modules ○ Converters to accept message ○

The CMW Tracing Package � The Server �Modules ○ Converters to accept message ○ Broker to distribute data ○ File. Writer and Database Writer ○ Registry keeping discovered sources �Can be deployed as all-in-one process or separate processes ○ Scale horizontally and vertically 28 th June 2012 21

The CMW Tracing Package C/C++ & Java Libraries for Log Events � C/C++ Library

The CMW Tracing Package C/C++ & Java Libraries for Log Events � C/C++ Library for Config Messages � � Server �Accepts events coming via UDP or TCP �Stores events in database and files �Sends events to multiple receivers � User Interface(s) �“online” : Java GUI, Linux Console (web console) �“offline” : Database viewer based UDP based

The CMW Tracing Service � Nearly all CMW services send log events �Proxies, RDA

The CMW Tracing Service � Nearly all CMW services send log events �Proxies, RDA servers, JMS, … Great help for identifying problems � Easy to extend to other protocols � Performance �~100 M Messages/day � 6% stored in the DB � 100% stored additionally in Files �System does very well Low network and CPU load �

The CMW Tracing Service Collects also other information than log events � What is

The CMW Tracing Service Collects also other information than log events � What is done, where and when by whom? � �Software upgrades / installations �Process restarts events �Configuration changes 28 th June 2012 24

Summary Gain control by exposing process internal information to enable constant monitoring for pre-failure

Summary Gain control by exposing process internal information to enable constant monitoring for pre-failure recognition JMX for Java Processes � CMWAdmin for CMW servers � CMX for C/C++ general services � Tracing/Central Logging System � � DIAMON But: try also to monitor the system as the user sees it �JMS : send test message and measure speed > 100 ms = WARNING 28 th June 2012 25

28 th June 2012 26

28 th June 2012 26