Pyramid Sketch a Sketch Framework for Frequency Estimation

  • Slides: 33
Download presentation
Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Tong Yang, Yang

Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Tong Yang, Yang Zhou, Hao Jin, Peking University Shigang Chen, University of Florida, USA Xiaoming Li, Peking University, China

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Three Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion

Background Problem: Hot Items Up High speed dat ing Data Structure Cold Items Hash

Background Problem: Hot Items Up High speed dat ing Data Structure Cold Items Hash tables: g n i at d p U memory inefficient, and slow Frequency Query

Background Typical sketches: • CM sketch • CU sketch • Count sketch -------Journal of

Background Typical sketches: • CM sketch • CU sketch • Count sketch -------Journal of Algorithms 2005, cited 976 times. -------SIGCOMM 2002, cited 949 times. -------Automata, Languages and Programming, 2002, cited 715 times. • Augmented sketch ------ SIGMOD 2016 • Slim-Fat sketch ------ ICDE 2017

Background Prior art --- CM Sketch Insertion: when newfrequency Deletion: delete e item e

Background Prior art --- CM Sketch Insertion: when newfrequency Deletion: delete e item e comes Query: query foraitem the of the item e 5 +1 -1 7 +1 -1 -1 10 +1 … … e Reported value: 5

Background Prior art --- CU Sketch Insertion: when newfrequency item e comes Query: query

Background Prior art --- CU Sketch Insertion: when newfrequency item e comes Query: query forathe of the item e 5 +1 7 10 … … Reported value: 5 e Obviously, CU sketch achieves higher accuracy than CM Sketch.

Background Hot item 30 k • Design goal: High memory efficiency High update speed

Background Hot item 30 k • Design goal: High memory efficiency High update speed High accuracy Cold item 2

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion

Techniques I 1 Counter-pair Sharing . . . Hybrid Counter Pure Counter …… .

Techniques I 1 Counter-pair Sharing . . . Hybrid Counter Pure Counter …… . . . …… …… … e

Techniques I 1 Counter-pair Sharing left flag counting part right flag parent left child

Techniques I 1 Counter-pair Sharing left flag counting part right flag parent left child right child

Techniques I 1 Counter-pair Sharing Insertion Example: The counter size is set to 4

Techniques I 1 Counter-pair Sharing Insertion Example: The counter size is set to 4 bits. parent L 2 L 1 0 An item e comes in. 1 0 15 0 16 10 right child left child Right child counter is supposed to be incremented Perform a carry operation e

Techniques I 1 Counter-pair Sharing Query Example: L 3 L 2 L 1 0

Techniques I 1 Counter-pair Sharing Query Example: L 3 L 2 L 1 0 2 1 1 parent 10 0 left child right child e The counter size is set to 4 bits. We want to query the item e. Query value from the right child can be obtained as shown. 0*1 + 1*1*16 + 0*2*64 = 16

Techniques I 1 Counter-pair Sharing • Memory efficiency: 1) Counter size is kept small.

Techniques I 1 Counter-pair Sharing • Memory efficiency: 1) Counter size is kept small. 2) It automatically assigns appropriate number of small counters to store the frequency of each item.

Techniques II 2 Word acceleration

Techniques II 2 Word acceleration

Techniques II 2. 1 Word constraint Assume we hash an item e to k

Techniques II 2. 1 Word constraint Assume we hash an item e to k counters Word Constraint L 1 Each insertion needs: k memory accesses and k hash computations at layer 1. . L 2 . . . e A machine word Each insertion needs: 1 memory access and k+1 hash computations at layer 1.

Techniques II 2. 2 Word Sharing L L L 1 L Word sharing 3

Techniques II 2. 2 Word Sharing L L L 1 L Word sharing 3 L 2 L e A machine word 1 3 2 e

Techniques II 2. 3 One hashing . . . L L 2 1 .

Techniques II 2. 3 One hashing . . . L L 2 1 . . . e A machine word Use one hash function to compute a 32 bit hash value. First 16 bits, locating a word (64 bits) The rest 4*4 bits, locating 4 counters in the word

Techniques III 3 Ostrich Policy When an item e comes. . . …… .

Techniques III 3 Ostrich Policy When an item e comes. . . …… . . . Without Ostrich policy, the strict insertion strategy of PCU will be slow . . . Ostrich Policy can be only applied to CU sketch with Pyramid: PCU. …… …… … … e

Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU will insert e as …

Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU will insert e as … …… . . . We merely query the three colored counter to get three values. . When an item e comes. . . …… …… … … e

Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU achieves. . . …… .

Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU achieves. . . …… . . . 1) Speed acceleration: Around one memory access for each insertion. …… 2) Amazingly, accuracy improvement! …… … … e

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Accuracy Ø Speed 2. Four Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion

Evaluation Experiment setup Datasets: We use three kinds of datasets as follows. 1) Real

Evaluation Experiment setup Datasets: We use three kinds of datasets as follows. 1) Real IP-Trace Streams 2) Real-Life Transaction Dataset 3) Synthetic Datasets Implementation: We applied Pyramid to 4 typical sketches. Computation platform: A machine with 12 -core CPUs and 62 GB DRAM. CPU has three levels of cache memory: two 32 KB L 1 caches for each core, one 256 KB L 2 cache for each core, and one 15 MB L 3 cache shared by all cores.

Evaluation Accuracy

Evaluation Accuracy

Evaluation Effects of techniques

Evaluation Effects of techniques

Evaluation Accuracy

Evaluation Accuracy

Evaluation Speed

Evaluation Speed

Evaluation Speed

Evaluation Speed

Evaluation Speed

Evaluation Speed

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment

Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion

Conclusion • Sketches have been applied to various fields. In this paper, we propose

Conclusion • Sketches have been applied to various fields. In this paper, we propose a sketch framework - the Pyramid sketch, to significantly improve the update speed and accuracy. • We applied our framework to four typical sketches: sketches of CM, CU, Count, and Augmented. • Experimental results show that our framework significantly improves both accuracy and speed. • We believe our framework can be applied to many more sketches.

Thanks! Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Source codes:

Thanks! Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Source codes: http: //net. pku. edu. cn/~yangtong/ 20 January 2022 IWQo. S 2015 32

Conclusion • Sketches have been applied to various fields. In this paper, we propose

Conclusion • Sketches have been applied to various fields. In this paper, we propose a sketch framework - the Pyramid sketch, to significantly improve the update speed and accuracy. • We applied our framework to four typical sketches: sketches of CM, CU, Count, and Augmented. • Experimental results show that our framework significantly improves both accuracy and speed. • We believe our framework can be applied to many more sketches.