To Share or Not to Share Ryan Johnson

  • Slides: 28
Download presentation
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril,

To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University **HP LABS

Motivation For Work Sharing Query: What is the average GPA in the ECE dept.

Motivation For Work Sharing Query: What is the average GPA in the ECE dept. ? outpu t aggregate join scan Dept http: //www. pdl. cmu. edu/ Query: What is the highest undergraduate GPA? scan Student 2 Ryan Johnson ©

Motivation For Work Sharing • Many queries in system outpu • Similar requests t

Motivation For Work Sharing • Many queries in system outpu • Similar requests t t • Redundant work aggregate • Work Sharing • Detect redundant work • Compute results once and share join scan Dept • Big win for I/O, uniprocessors scan Student • 2 x speedup for TPC-H queries [hariz 05] http: //www. pdl. cmu. edu/ 3 Ryan Johnson ©

Work Sharing on Modern Hardware Speedup due to WS 2. 0 1 CPU 8

Work Sharing on Modern Hardware Speedup due to WS 2. 0 1 CPU 8 CPU 1. 5 1. 0 7 x core L 1 L 2 L 2 0. 5 0. 0 0 Memory 15 30 45 Shared Queries • Work sharing can hurt performance! http: //www. pdl. cmu. edu/ 4 Ryan Johnson ©

Contributions • Observation • Work sharing can hurt performance on parallel hardware • Analysis

Contributions • Observation • Work sharing can hurt performance on parallel hardware • Analysis • Develop intuitive analytical model of work sharing • Identify trade-off between total work, critical path • Application • Model-based policy outperforms static ones by up to 6 x http: //www. pdl. cmu. edu/ 5 Ryan Johnson ©

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http: //www. pdl. cmu. edu/ 6 Ryan Johnson ©

Challenges of Exploiting Work Sharing • Independent execution only? • Load reduction from work

Challenges of Exploiting Work Sharing • Independent execution only? • Load reduction from work sharing can be useful • Work sharing only? • Indiscriminate application can hurt performance • To share or not to share? • System and workload dependent • Adapt decisions at runtime • Must understand work sharing to exploit it fully http: //www. pdl. cmu. edu/ 7 Ryan Johnson ©

Work Sharing vs. Parallelism Query 1 P = 4. 33 Query 1 response time

Work Sharing vs. Parallelism Query 1 P = 4. 33 Query 1 response time Critical Paths Query 2 Aggregat e Join Query 2 response time http: //www. pdl. cmu. edu/ Independent Execution 8 Scan Ryan Johnson ©

Work Sharing vs. Parallelism Query 1 P = 4. 33 P = 2. 75

Work Sharing vs. Parallelism Query 1 P = 4. 33 P = 2. 75 Query 1 response time Critical path now longer Penalty Query 2 Aggregat e Join Query 2 response time Scan • Total work and critical path both important Shared Execution http: //www. pdl. cmu. edu/ 9 Ryan Johnson ©

Understanding Work Sharing • Performance depends on two factors: • Work sharing presents a

Understanding Work Sharing • Performance depends on two factors: • Work sharing presents a trade-off • Reduces total work • Potentially lengthens critical path • Balance both factors or performance suffers http: //www. pdl. cmu. edu/ 10 Ryan Johnson ©

Basis for a Model • “Closed” system • • Consistent high load Throughput computing

Basis for a Model • “Closed” system • • Consistent high load Throughput computing Assumed in most benchmarks Fixed number of clients • Little’s Law governs throughput • Higher response time = lower throughput • Total work not a direct factor! • Load reduction secondary to response time http: //www. pdl. cmu. edu/ 11 Ryan Johnson ©

Predicting Response Time • Case 1: Compute-bound • Case 2: Critical path-bound • Larger

Predicting Response Time • Case 1: Compute-bound • Case 2: Critical path-bound • Larger bottleneck determines response time • Model provides u and pmax http: //www. pdl. cmu. edu/ 12 Ryan Johnson ©

An Analytical Model of Work Sharing Throughput for m queries and n processors U

An Analytical Model of Work Sharing Throughput for m queries and n processors U = requested utilization Improved Pmax = longest pipe stage by Potentially work sharing worsened by work sharing • Sharing helpful when Xshared > Xalone http: //www. pdl. cmu. edu/ 13 Ryan Johnson ©

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http: //www. pdl. cmu. edu/ 14 Ryan Johnson ©

Experimental Setup • Hardware • Sun T 2000 “Niagara” with 16 GB RAM •

Experimental Setup • Hardware • Sun T 2000 “Niagara” with 16 GB RAM • 8 cores (32 threads) • Solaris processor sets vary effective CPU count • Cordoba • Staged DBMS • Naturally exposes work sharing • Flexible work sharing policies • 1 GB TPCH dataset • Fixed Client and CPU counts per run http: //www. pdl. cmu. edu/ 15 Ryan Johnson ©

Model Validation: TPCH Q 1 Predicted vs. Measured Performance Speedup due to WS 1.

Model Validation: TPCH Q 1 Predicted vs. Measured Performance Speedup due to WS 1. 4 1. 2 1 CPU model 1 2 CPU model 0. 8 0. 6 8 CPU model 0. 4 32 CPU model 0. 2 0 0 15 • Avg/max error: http: //www. pdl. cmu. edu/ 30 45 5. 7% / 22% Shared Queries 16 Ryan Johnson ©

Model Validation: TPCH Q 4 • Behavior varies with both system and workload http:

Model Validation: TPCH Q 4 • Behavior varies with both system and workload http: //www. pdl. cmu. edu/ 17 Ryan Johnson ©

Exploring WS vs. Parallelism • Work sharing splits query into three parts Example: •

Exploring WS vs. Parallelism • Work sharing splits query into three parts Example: • Independent work – Per-query, parallel – Total work • Serial work – Per-query, serial – Critical path • Shared work – Computed once – “Free” after first query http: //www. pdl. cmu. edu/ Independent 37%Serial 4% Shared - 59% 18 Ryan Johnson ©

Benefit from Work Sharing Exploring WS vs. Parallelism Potential Speedup 2. 5 CPUs 2

Benefit from Work Sharing Exploring WS vs. Parallelism Potential Speedup 2. 5 CPUs 2 4 8 16 32 1. 5 1 0. 5 0 0 • Behavior http: //www. pdl. cmu. edu/ 8 16 24 matches. Shared previously published Queries 19 32 results Ryan Johnson ©

Benefit from Work Sharing Exploring WS vs. Parallelism Potential Speedup 2. 5 CPUs 2

Benefit from Work Sharing Exploring WS vs. Parallelism Potential Speedup 2. 5 CPUs 2 4 8 1. 5 1 0. 5 Saturated 0 0 http: //www. pdl. cmu. edu/ 8 16 Shared Queries 20 24 32 Ryan Johnson ©

Benefit from Work Sharing Exploring WS vs. Parallelism 2. 5 CPUs 2 4 8

Benefit from Work Sharing Exploring WS vs. Parallelism 2. 5 CPUs 2 4 8 16 32 Potential Speedup 1. 5 1 0. 5 Saturated 0 0 http: //www. pdl. cmu. edu/ 8 16 Shared Queries 21 24 32 Ryan Johnson ©

Benefit from Work Sharing Exploring WS vs. Parallelism 2. 5 CPUs 2 4 8

Benefit from Work Sharing Exploring WS vs. Parallelism 2. 5 CPUs 2 4 8 16 32 1. 5 1 Potential Speedup 0. 5 Saturated 0 0 8 16 24 32 • More processors shift bottleneck Shared Queries to critical path http: //www. pdl. cmu. edu/ 22 Ryan Johnson ©

Benefit from Work Sharing Performance Impact of Serial Work (32 CPU) 2. 5 0%

Benefit from Work Sharing Performance Impact of Serial Work (32 CPU) 2. 5 0% 1% 2% 7% 2 1. 5 1 0. 5 0 0 10 20 Shared Queries 30 40 • Critical path quickly becomes major bottleneck http: //www. pdl. cmu. edu/ 23 Ryan Johnson ©

Model-guided Work Sharing • Integrate predictive model into Cordoba • Predict benefit of work

Model-guided Work Sharing • Integrate predictive model into Cordoba • Predict benefit of work sharing for each new query • Consider multiple groups of queries at once • Shorter critical path, increased parallelism • Experimental setup • Profile run with 2 clients, 2 CPUs • Extract model parameters with profiling tools • 20 clients submit mix of TPCH Q 1 and Q 4 • Compare against always-, never-share policies http: //www. pdl. cmu. edu/ 24 Ryan Johnson ©

Comparison of Work Sharing Strategies 2 CPU 250 always share model guided never share

Comparison of Work Sharing Strategies 2 CPU 250 always share model guided never share 150 Queries/min 200 100 50 32 CPU 200 150 100 50 0 0 All Q 1 50/50 All Q 4 All Q 1 Query Ratio 50/50 All Q 4 Query Ratio • Model-based policy balances critical path and load http: //www. pdl. cmu. edu/ 25 Ryan Johnson ©

Related Work • Many existing work sharing schemes • Identification occurs at different stages

Related Work • Many existing work sharing schemes • Identification occurs at different stages in the query’s lifetime • All allow pipelined query execution Multiple Synchroniz Query Materialized Staged ed Optimizatio Views DBMS Scanning [rouss 82] n [hariz 05] [lang 07] [roy 00] Early Schema Query Buffer Pool Late design compilation execution Access • Model describes all types of work sharing http: //www. pdl. cmu. edu/ 26 Ryan Johnson ©

Conclusions • Work sharing can hurt performance • Highly parallel, memory resident machines •

Conclusions • Work sharing can hurt performance • Highly parallel, memory resident machines • Intuitive analytical model captures behavior • Trade-off between load reduction and critical path • Model-guided work sharing highly effective • Outperforms static policies by up to 6 x http: //www. cs. cmu. edu/~Staged. DB/ http: //www. pdl. cmu. edu/ 27 Ryan Johnson ©

References • [hariz 05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously

References • [hariz 05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine. ” In Proc. SIGMOD, 2005. • [lang 07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling. ” In Proc. ICDE, 2007. • [rouss 82] N. Roussopoulos. “View Indexing in Relational databases. ” In ACM TODS, 7(2): 258 -290, 1982. • [roy 00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization. ” In Proc. SIGMOD, 2000. http: //www. cs. cmu. edu/~Staged. DB/ http: //www. pdl. cmu. edu/ 28 Ryan Johnson ©