Warehousing Massive Performance Data on Commodity Servers Charles

  • Slides: 30
Download presentation
Warehousing Massive Performance Data on Commodity Servers Charles Loboz, Slawek Smyl, Suman Nath Microsoft

Warehousing Massive Performance Data on Commodity Servers Charles Loboz, Slawek Smyl, Suman Nath Microsoft Corporation

Monitoring Large Data. Centers Management Task Monitoring Planning Historical analysis CPU, memory, disk utilization,

Monitoring Large Data. Centers Management Task Monitoring Planning Historical analysis CPU, memory, disk utilization, … Response time, queue length, … Performance data Context Performance Data Design Goals Data. Garage Query Processing Experiments

Monitoring Data Management 100 K servers = 1 TB data per day! Storage challenge

Monitoring Data Management 100 K servers = 1 TB data per day! Storage challenge Query challenge Store data over many months, years Petabytes of data Hours to run simple queries Context Performance Data Design Goals Data. Garage Query Processing Experiments

Data. Garage Performance data warehousing system CPU, memory, disk utilization, … Response time, queue

Data. Garage Performance data warehousing system CPU, memory, disk utilization, … Response time, queue length, … Storage, query processing Efficient, scalable, cheap Performance data Context Performance Data Design Goals Data. Garage Query Processing Experiments

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing Evaluation Conclusion

Performance Data Collection Time CPU Mem Jobs Disk … 10: 00 48 37 3

Performance Data Collection Time CPU Mem Jobs Disk … 10: 00 48 37 3 134 … 10: 01 52 39 3 342 … 10: 02 58 45 2 324 … … … … Our Deployment Monitoring process CPU utilization, memory usage, disk space, SQL queue length, app response time, cache hit rate, network bandwidth, … Sampling period 15 seconds 100 -1000 counters/server 5 -100 MB/server/day 0. 01% CPU time Context Performance Data Design Goals Data. Garage Query Processing Experiments

Performance Data Characteristics • Heterogeneous counter sets – 30 K different counters, 100 -1000

Performance Data Characteristics • Heterogeneous counter sets – 30 K different counters, 100 -1000 per server • Numeric, read-only, possibly-dirty – Dirty data retained, may be ignored for query • Hierarchical queries – Selection, projection, aggregation, data mining • Fraction of hotmail. com servers in a given rack with CPU utilization > 50% • Average memory utilization trend of hotmail servers Context Performance Data Design Goals Data. Garage Query Processing Experiments

Data. Garage Design Goals • Small storage footprint – Reduces storage and communication cost

Data. Garage Design Goals • Small storage footprint – Reduces storage and communication cost – Small pay-as-you-go cost for Cloud systems • Cheap – Commodity hardware and off-the-shelf software • Fast and robust query processing – Allows fast decisions – Tolerates faulty and slow hardware • Simple and flexible query interface (SQL + UDF) – Fast query writing Context Performance Data Design Goals Data. Garage Query Processing Experiments

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing Evaluation Conclusion

Options • Table. Store: Relational table – DB engine: single-node DBMS, parallel DBMS –

Options • Table. Store: Relational table – DB engine: single-node DBMS, parallel DBMS – Map. Reduce: Hadoop. DB [Abouzeid et al. VLDB’ 09] • File. Store: Files – Map. Reduce: Hadoop, Dryad [Isard et al. , Euro. Sys’ 07] Context Performance Data Design Goals Data. Garage Query Processing Experiments

Trade-offs Performance Table. Store + Parallel DB Engine (DBMS-X) Faulttolerance Cost Storage footprint Table.

Trade-offs Performance Table. Store + Parallel DB Engine (DBMS-X) Faulttolerance Cost Storage footprint Table. Store + MR + single node DB (Hadoop. DB) File. Store + Map. Reduce (Hadoop, Dryad) Table. Store in files + Map. Reduce (Data. Garage) Context Performance Data Design Goals Data. Garage Query Processing Experiments

Storage Inefficiency: Table. Store Key problem: heterogeneous counter sets Total 30, 000 unique counters,

Storage Inefficiency: Table. Store Key problem: heterogeneous counter sets Total 30, 000 unique counters, <1000/server All possible counters • Too many columns • >95% sparse Value Counter id Timestamps Narrow table Machine id Counter n Counter 2 Counter 1 Timestamps Machine id Wide table Key-value store • Redundant keys (4 x more expensive than raw data) • Expensive joins needed Context Performance Data Design Goals Data. Garage Query Processing Experiments

Storage Inefficiency: File. Store • Heterogeneous counter sets – Files need to maintain schema

Storage Inefficiency: File. Store • Heterogeneous counter sets – Files need to maintain schema for each server • No structure in data – Compression cannot exploit data correlation Context Performance Data Design Goals Data. Garage Query Processing Experiments

Our Solution • One wide-table per server – Benefits of Table. Store, without sparseness/

Our Solution • One wide-table per server – Benefits of Table. Store, without sparseness/ redundancy • Each wide-table in an embedded database file SQL Lite, MS SQL Server Compact Edition . sdf file – Benefits of File. Store c 1 c 2 c 3 c 1 c 4 c 6 c 7 c 8 c 2 c 4 c 5 c 8 File system Microsoft SQL Server Compact Edition library Context Performance Data Design Goals Data. Garage Query Processing Experiments

Data. Garage Architecture Query Controller (Query Dissemination) Data analysis tools Distributed file system Summary

Data. Garage Architecture Query Controller (Query Dissemination) Data analysis tools Distributed file system Summary Database Embedded database Data collector Context Performance Data Design Goals Data. Garage Query Processing Experiments

Data Compression • Zipping files with PKZip is not effective • Compress one column

Data Compression • Zipping files with PKZip is not effective • Compress one column at a time – Exploit strong correlation – RLE, delta encoding not very effective • Our idea: Bit-truncation + Byte-interleaving … 42 42 AE AE 91 83 2 B 39 … … A 0 E 4 38 C 4 if lossy <1% 42 42 AE AE … … 91 83 2 B 39 42 42. . AE AE. . 91 83 … Context Performance Data Design Goals Data. Garage Query Processing Experiments

Storage Efficiency Context Performance Data Design Goals Data. Garage Query Processing Results

Storage Efficiency Context Performance Data Design Goals Data. Garage Query Processing Results

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing Evaluation Conclusion

Data. Garage Query • Data. Garage query: Three components – On: filesystem path: /hotmail/dc

Data. Garage Query • Data. Garage query: Three components – On: filesystem path: /hotmail/dc 1/*. 10 -. -2009. sdf – Apply: a SQL query run on individual database files – Combine: a SQL query to compute final result • Enables map-reduce style execution Context Performance Data Design Goals Data. Garage Query Processing Experiments

Query Execution Controller Node Execution Nodes Distributed File system Apply On Combine Dissemination Controller

Query Execution Controller Node Execution Nodes Distributed File system Apply On Combine Dissemination Controller Combine Result … Temporary Context Performance Data Design Goals Data. Garage Query Processing Experiments

Query Execution Time Context Performance Data Design Goals Data. Garage Query Processing Results

Query Execution Time Context Performance Data Design Goals Data. Garage Query Processing Results

Fault Tolerance • Data. Garage key technology: – Decoupling of execution and storage –

Fault Tolerance • Data. Garage key technology: – Decoupling of execution and storage – Fine-grained data partitioning • Data is replicated by the file system • Slow execution nodes – Assigned smaller jobs – Faster nodes take additional load after finished • Execution node failures – New nodes work on remaining job of failed nodes Context Performance Data Design Goals Data. Garage Query Processing Experiments

Goals Revisited • High performance: queries are pushed inside embedded database • Storage efficient:

Goals Revisited • High performance: queries are pushed inside embedded database • Storage efficient: compression • Fault tolerant: fine partitioning of data and query processing, aggressive restarting, speculative execution • Hierarchical queries: file system paths • Simple interface: SQL queries • Cheap: off-the-shelf tools, commodity machines

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing

Outline • • Context Performance data characteristics Design goals Data. Garage design Query Processing Experience Conclusion

Operational Experience • Have been in operation for more than 1 year – Warehousing

Operational Experience • Have been in operation for more than 1 year – Warehousing data from Microsoft data centers • Partitioning with fine granularity + compression is the key to store massive data – Previous implementation with narrow table • 30 K server-days in 1 TB disk • Slow queries – Current implementation: • 1 -3 million server-days/TB • Orders of magnitude faster queries Context Performance Data Design Goals Data. Garage Query Processing Results

Operational Experience • Embedded database files give flexibility – Placement, backup simplified – Scavenge

Operational Experience • Embedded database files give flexibility – Placement, backup simplified – Scavenge available storage on the fly • Simple design helps – Several thousands lines of C# code to glue together existing tools (FS, Embedded DB, R, …) • Defer features until necessary: Parallel Combine • Good fit with Cloud computing model – Data and/or computation can be on the Cloud – Cheap: only file storage needed, small footprint Context Performance Data Design Goals Data. Garage Query Processing Results

Conclusion • Existing solutions are not efficient for warehousing performance data • Data. Garage:

Conclusion • Existing solutions are not efficient for warehousing performance data • Data. Garage: performance data warehouse • Cheap, scalable, fault tolerant – Combines benefits of DB, Map. Reduce, file systems • Operational experience shows the benefits Questions? Context Performance Data Design Goals Data. Garage Query Processing Results

Compression Overhead Context Performance Data Design Goals Data. Garage Query Processing Results

Compression Overhead Context Performance Data Design Goals Data. Garage Query Processing Results

Related Work • Hadoop. DB – Data. Garage has finer data partitioning • Improves

Related Work • Hadoop. DB – Data. Garage has finer data partitioning • Improves fault tolerance and storage efficiency – Data. Garage uses embedded databases • Cheap, enables using hierarchical file system – Data. Garage uses data compression Context Performance Data Design Goals Data. Garage Query Processing Experiments

Query Processing <apply_script> <target> Controller (Query Dissemination) Result <combine_script> <apply_script> Temporary table Embedded database

Query Processing <apply_script> <target> Controller (Query Dissemination) Result <combine_script> <apply_script> Temporary table Embedded database <apply_script> Distributed file system Context Performance Data Design Goals Data. Garage Query Processing Experiments