Technische Universitt Mnchen Massively Parallel SortMerge Joins MPSM
- Slides: 29
Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and Thomas Neumann Technische Universität München
Technische Universität München Hardware trends … • Huge main memory • Massive processing parallelism • Non-uniform Memory Access (NUMA) • Our server: – – 4 CPUs 32 cores 1 TB RAM 4 NUMA partitions CPU 0 2
Technische Universität München Main memory database systems • Volt. DB, Hana, Monet. DB • Hy. Per: real-time business intelligence queries on transactional data* * http: //www-db. in. tum. de/research/projects/Hy. Per/ 3
Technische Universität München How to exploit these hardware trends? • Parallelize algorithms • Exploit fast main memory access Kim, Sedlar, Chhugani: Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. VLDB‘ 09 Blanas, Li, Patel: Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. SIGMOD‘ 11 • AND be aware of fast local vs. slow remote NUMA access 4
Technische Universität München Ignoring NUMA core 1 NUMA partition 3 hashable core 5 core 2 core 6 core 3 core 7 core 4 core 8 NUMA partition 2 NUMA partition 4 5
Technische Universität München re te o m l ca lo s sort 837 ms 12946 ms 7440 ms scaled execution time 417344 ms 22756 ms 100% 1000 ms How much difference does NUMA make? d tial e iz en n o hr sequ c yn partitioning l te ca o lo rem merge join (sequential read) 6
Technische Universität München The three NUMA commandments C 1 Thou shalt not write thy neighbor‘s memory randomly -- chunk the data, redistribute, and then sort/work on your data locally. C 2 C 3 Thou shalt read thy neighbor‘s Thou shalt not wait for thy neighbors memory only -- don‘t use finesequentially grained latching or -- let the locking and avoid prefetcher hide synchronization the remote points of parallel access latency. threads. 7
Technische Universität München Basic idea of MPSM R R chunks chunk R S chunks S chunk S 8
Technische Universität München Basic idea of MPSM • C 1: Work locally: sort • C 3: Work independently: sort and merge join • C 2: Access neighbor‘s data only sequentially chunk R sort R chunks locally R chunks MJ S chunks MJ MJ MJ merge join chunks sort S chunks locally chunk S 9
Technische Universität München Range partitioning of private input R • To constrain merge join work • To provide scalability in the number of parallel workers 10
Technische Universität München Range partitioning of private input R • To constrain merge join work • To provide scalability in the number of parallel workers R chunks range partition R range partitioned R chunks 11
Technische Universität München Range partitioning of private input R • To constrain merge join work • To provide scalability in the number of parallel workers S is implicitly partitioned range partitioned R chunks sort R chunks S chunks sort S chunks 12
Technische Universität München Range partitioning of private input R • To constrain merge join work • To provide scalability in the number of parallel workers S is implicitly partitioned range partitioned R chunks sort R chunks MJ S chunks MJ MJ MJ merge join only relevant parts sort S chunks 13
Technische Universität München Range partitioning of private input • Time efficient branch-free comparison-free synchronization-free and • Space efficient densely packed in-place by using radix-clustering and precomputed target partitions to scatter data to 14
Technische Universität München chunk of worker W 2 chunk of worker W 1 Range partitioning of private input 19 9 7 3 21 1 17 2 23 4 31 8 20 26 19=10011 7= 00111 histogram of worker W 1 4 3 <16 ≥ 16 prefix sum of worker W 1 0 0 1 17 = 10001 2=00010 W 1 W 2 histogram of worker W 2 3 4 <16 ≥ 16 prefix sum of worker W 2 4 5 3 W 1 2 19 W 2 15
Technische Universität München chunk of worker W 2 chunk of worker W 1 Range partitioning of private input 19 9 7 3 21 1 17 2 23 4 31 8 20 26 19=10011 7= 00111 histogram of worker W 1 4 3 <16 ≥ 16 prefix sum of worker W 1 0 0 1 17 = 10001 2=00010 W 1 W 2 histogram of worker W 2 3 4 <16 ≥ 16 prefix sum of worker W 2 4 5 3 W 1 W 2 9 7 3 1 2 4 8 19 21 17 23 31 20 26 16
Technische Universität München Real C hacker at work …
Technische Universität München Skew resilience of MPSM • Location skew is implicitly handled • Distribution skew: – Dynamically computed partition bounds – Determined based on the global data distributions of R and S – Cost balancing for sorting R and joining R and S 18
Technische Universität München Skew resilience 1. Global S data distribution – Local equi-height histograms (for free) – Combined to CDF 1 7 10 15 22 31 66 81 2 12 17 25 33 42 78 90 S 1 S 2 # tuples CDF 16 13 50 key value 19
Technische Universität München Skew resilience 2. Global R data distribution – Local equi-width histograms as before – More fine-grained histograms 2 13 4 31 8 20 6 2 = 00010 8 = 01000 histogram 3 2 1 1 <8 [8, 16) [16, 24) ≥ 24 R 1 20
Technische Universität München Skew resilience 3. Compute splitters so that overall workloads are balanced*: greedily combine buckets, thereby balancing the costs of each thread for sorting R and joining R and S are balanced # tuples CDF histogram + 3 2 1 1 = 2 4 6 13 31 8 20 key value * Ross and Cieslewicz: Optimal Splitters for Database Partitioning with Size Bounds. ICDT‘ 09 21
Technische Universität München Performance evaluation • MPSM performance in a nutshell: – 160 mio tuples joined per second – 27 bio tuples joined in less than 3 minutes – scales linearly with the number of cores • Platform Hy. Per 1: – Linux server – 1 TB RAM – 4 CPUs with 8 physical cores each • Benchmark: – Join tables R and S with schema {[joinkey: 64 bit, payload: 64 bit]} – Dataset sizes ranging from 50 GB to 400 GB 22
Technische Universität München Execution time comparison • MPSM, Vectorwise (VW), and Blanas hash join* • 32 workers • |R| = 1600 mio (25 GB), varying size of S * S. Blanas, Y. Li, and J. M. Patel: Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. SIGMOD 2011 23
Technische Universität München Scalability in the number of cores • MPSM and Vectorwise (VW) • |R| = 1600 mio (25 GB), |S|=4*|R| 24
Technische Universität München Location skew • Location skew in R has no effect because of repartitioning • Location skew in S: in the extreme case all join partners of Ri are found in only one Sj (either local or remote) 25
Technische Universität München Distribution skew: anti-correlated data without balanced partitioning with balanced partitioning 26
Technische Universität München Distribution skew : anti-correlated data 27
Technische Universität München Conclusions • • • MPSM is a sort-based parallel join algorithm MPSM is NUMA-aware & NUMA-oblivious MPSM is space efficient (works in-place) MPSM scales linearly in the number of cores MPSM is skew resilient MPSM outperforms Vectorwise (4 X) and Blanas et al‘s hash join (18 X) • MPSM is adaptable for disk-based processing – See details in paper 28
Technische Universität München Massively Parallel Sort-Merge Joins (MPSM) in Main Memory Multi-Core Database Systems Martina Albutiu, Alfons Kemper, and Thomas Neumann Technische Universität München THANK YOU FOR YOUR ATTENTION!
- Programming massively parallel processors
- Programming massively parallel processors, kirk et al.
- Programming massively parallel processors
- Massively parallel processing ppt
- Universitt freiburg
- Universitt wien
- Universitt
- Harvard universitt
- Chapter 12 dna and rna
- Eight parts of speech
- Joins
- A _________bond joins amino acids together.
- Zhuoyue zhao
- Method of joins
- Holliday junction
- These are words that are called joiners or connectors.
- Sql'
- Joins two words together
- Joins words or word groups
- Uni bielefeld technische fakultät
- Rettungsgrundsatz
- Memory z dortmund
- Stage technische informatica
- Technische voorraad berekenen
- Technische richtlijnen voor een woningkwaliteitsonderzoek
- Gerätekunde technische hilfeleistung
- Argenta proslide handleiding
- Vorlesung
- Rechnerstrukturen kit
- Technische rettung grundsätze