Big Data Challenges in Application Performance Management Tilmann

Big Data Challenges in Application Performance Management Tilmann Rabl Hans-Arno Jacobsen Serge Mankovskii XLDB Conference 2011 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG

2 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Abstract Modern Web Data Platforms (WDPs) handle large amounts of data and activity through massively distributed infrastructures. To achieve performance and availability at Internet scale, WDPs restrict querying capability, and provide weaker consistency guarantees than traditional ACID transactions. The reduced functionality is sufficient for many web applications. High data and query rates also appear in application performance management (APM). APM has similar requirements like current web based information systems such as weaker consistency needs, geographical distribution and asynchronous processing. At the same time, APM has some unique features and requirements that make previously published research and existing architectures inapplicable. XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

3 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Application Performance Management • Enterprise system architectures ▫ Very complex distributed systems ▫ Need of detailed monitoring ▫ Service level agreements • Application performance management ▫ ▫ ▫ How many transactions fail? Where is the root cause of failure? What is the end to end response time? Which component is the bottleneck? Which and how many transactions are there? XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

4 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Enterprise System Architecture SAP Identity Manager Application Server Message Queue Database Web Server Application Server Message Broker Web Service Client Application Server Main Frame 3 rd Party Database XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

5 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Java Byte Code Instrumentation • JSR – 163 • JVM is augmented with agent • Agent can run additional code ▫ ▫ No change of code base Trace transactions Measure response times Other types of measurements Program JVM Agent Additional Code • Huge number of events ▫ Potentially for every method invocation XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org Events

6 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG APM Performance Requirements • High insert rates ▫ Millions inserts / sec • High query rates ▫ Thousands queries / sec • Write ratio: >99 % • Agents send data in bulks ▫ Different periods (seconds to minutes) • Big data ▫ 250 Bytes per record ▫ ~ 250 MB / sec ▫ ~ 600 TB / month XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

7 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG MADRID Project • Current system’s performance ▫ YCSB results < 15 K ops / sec ▫ TPC-C results ~ 500 K transactions / sec • Need for a new architecture ▫ ▫ ▫ Massive Asynchronous Dist. RIbuted Data Highly scalable High write throughput Apart from measurements data mostly static Static queries ØHybrid key-value store XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

8 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG MADRID Architecture • Materialized Views ▫ Static queries ▫ Filters ▫ Notifications View Manager Message Broker • Hybrid data store ▫ All nodes are equal Entry ▫ DHT style inserts Log ▫ Replication for static data • Asynchronous processing In-Memory Storage Disk Storage XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

9 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Schema Excerpt • Transaction types ▫ No instances ▫ Graph structure • Metric per transaction ▫ Type of measurement • Measurements ▫ Per transaction type ▫ Per metric type ▫ Can be aggregations Measurement value min_value max_value no_points start_time end_time metric_id Transition transaction_id head_component tail_component Metric metric_id metric_type transaction_id Component component_id machine description XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org Transaction transaction_id Transaction_name

10 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Materialized Views I • What is the average runtime of transaction XY? SELECT FROM WHERE AND AND transaction_name, AVG(end_time - start_time) Measurement ms, Metric mt, Transaction t ms. metric_id = mt. metric_id mt. transaction_id = t. transaction_id mt. metric_type = “runtime_metric” ms. start_time BETWEEN “ 18/10/2011” AND “ 19/10/2011” t. transaction_name = “XY” XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

11 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Materialized Views II • What is the average runtime of transaction XY? Metric metric_id metric_type transaction_id Transaction_name AVG_Runtime transaction_id transaction_name Measurement value min_value max_value no_points start_time end_time metric_id avg_value time_frame Transition transaction_id head_component tail_component Component component_id machine description XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org

12 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG. ORG Contact • Tilmann Rabl ▫ University of Toronto ▫ tilmann@msrg. utoronto. ca • Hans-Arno Jacobsen ▫ University of Toronto ▫ www. msrg. org • Serge Mankovskii ▫ CA Labs ▫ mankovskii@ca. com XLDB'11 - (C) 2011, Middleware Systems Research Group, msrg. org
- Slides: 12