Pyramid Sketch a Sketch Framework for Frequency Estimation
- Slides: 33
Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Tong Yang, Yang Zhou, Hao Jin, Peking University Shigang Chen, University of Florida, USA Xiaoming Li, Peking University, China
Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion
Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Three Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion
Background Problem: Hot Items Up High speed dat ing Data Structure Cold Items Hash tables: g n i at d p U memory inefficient, and slow Frequency Query
Background Typical sketches: • CM sketch • CU sketch • Count sketch -------Journal of Algorithms 2005, cited 976 times. -------SIGCOMM 2002, cited 949 times. -------Automata, Languages and Programming, 2002, cited 715 times. • Augmented sketch ------ SIGMOD 2016 • Slim-Fat sketch ------ ICDE 2017
Background Prior art --- CM Sketch Insertion: when newfrequency Deletion: delete e item e comes Query: query foraitem the of the item e 5 +1 -1 7 +1 -1 -1 10 +1 … … e Reported value: 5
Background Prior art --- CU Sketch Insertion: when newfrequency item e comes Query: query forathe of the item e 5 +1 7 10 … … Reported value: 5 e Obviously, CU sketch achieves higher accuracy than CM Sketch.
Background Hot item 30 k • Design goal: High memory efficiency High update speed High accuracy Cold item 2
Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion
Techniques I 1 Counter-pair Sharing . . . Hybrid Counter Pure Counter …… . . . …… …… … e
Techniques I 1 Counter-pair Sharing left flag counting part right flag parent left child right child
Techniques I 1 Counter-pair Sharing Insertion Example: The counter size is set to 4 bits. parent L 2 L 1 0 An item e comes in. 1 0 15 0 16 10 right child left child Right child counter is supposed to be incremented Perform a carry operation e
Techniques I 1 Counter-pair Sharing Query Example: L 3 L 2 L 1 0 2 1 1 parent 10 0 left child right child e The counter size is set to 4 bits. We want to query the item e. Query value from the right child can be obtained as shown. 0*1 + 1*1*16 + 0*2*64 = 16
Techniques I 1 Counter-pair Sharing • Memory efficiency: 1) Counter size is kept small. 2) It automatically assigns appropriate number of small counters to store the frequency of each item.
Techniques II 2 Word acceleration
Techniques II 2. 1 Word constraint Assume we hash an item e to k counters Word Constraint L 1 Each insertion needs: k memory accesses and k hash computations at layer 1. . L 2 . . . e A machine word Each insertion needs: 1 memory access and k+1 hash computations at layer 1.
Techniques II 2. 2 Word Sharing L L L 1 L Word sharing 3 L 2 L e A machine word 1 3 2 e
Techniques II 2. 3 One hashing . . . L L 2 1 . . . e A machine word Use one hash function to compute a 32 bit hash value. First 16 bits, locating a word (64 bits) The rest 4*4 bits, locating 4 counters in the word
Techniques III 3 Ostrich Policy When an item e comes. . . …… . . . Without Ostrich policy, the strict insertion strategy of PCU will be slow . . . Ostrich Policy can be only applied to CU sketch with Pyramid: PCU. …… …… … … e
Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU will insert e as … …… . . . We merely query the three colored counter to get three values. . When an item e comes. . . …… …… … … e
Techniques III 3 Ostrich Policy Using Ostrich Policy, PCU achieves. . . …… . . . 1) Speed acceleration: Around one memory access for each insertion. …… 2) Amazingly, accuracy improvement! …… … … e
Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Accuracy Ø Speed 2. Four Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion
Evaluation Experiment setup Datasets: We use three kinds of datasets as follows. 1) Real IP-Trace Streams 2) Real-Life Transaction Dataset 3) Synthetic Datasets Implementation: We applied Pyramid to 4 typical sketches. Computation platform: A machine with 12 -core CPUs and 62 GB DRAM. CPU has three levels of cache memory: two 32 KB L 1 caches for each core, one 256 KB L 2 cache for each core, and one 15 MB L 3 cache shared by all cores.
Evaluation Accuracy
Evaluation Effects of techniques
Evaluation Accuracy
Evaluation Speed
Evaluation Speed
Evaluation Speed
Outline 1. Background Ø Problem to address Ø Prior art 3. Evaluation Ø Experiment setup Ø Effects of techniques Ø Accuracy Ø Speed 2. Pyramid Techniques Ø Counter-pair sharing Ø Word acceleration a. Word constraint b. Word sharing c. One hashing Ø Ostrich policy 4. Conclusion
Conclusion • Sketches have been applied to various fields. In this paper, we propose a sketch framework - the Pyramid sketch, to significantly improve the update speed and accuracy. • We applied our framework to four typical sketches: sketches of CM, CU, Count, and Augmented. • Experimental results show that our framework significantly improves both accuracy and speed. • We believe our framework can be applied to many more sketches.
Thanks! Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Source codes: http: //net. pku. edu. cn/~yangtong/ 20 January 2022 IWQo. S 2015 32
Conclusion • Sketches have been applied to various fields. In this paper, we propose a sketch framework - the Pyramid sketch, to significantly improve the update speed and accuracy. • We applied our framework to four typical sketches: sketches of CM, CU, Count, and Augmented. • Experimental results show that our framework significantly improves both accuracy and speed. • We believe our framework can be applied to many more sketches.
- Cameron todd willingham house address
- Baseline measurement crime scene
- Rough sketch vs final sketch crime scene
- What is a joint relative frequency
- Observed frequency
- Power of sine wave
- Vmax shm
- Frequency vs relative frequency
- Joint frequency vs marginal frequency
- Joint relative frequency vs conditional relative frequency
- Yoku tokidoki amari zenzen
- Kontinuitetshantering
- Typiska novell drag
- Tack för att ni lyssnade bild
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Kassaregister ideell förening
- Tidbok yrkesförare
- A gastrica
- Förklara densitet för barn
- Datorkunskap för nybörjare
- Tack för att ni lyssnade bild
- Debatt artikel mall
- Delegerande ledarskap
- Nyckelkompetenser för livslångt lärande
- Påbyggnader för flakfordon
- Lufttryck formel
- Svenskt ramverk för digital samverkan
- Lyckans minut erik lindorm analys
- Presentera för publik crossboss
- Vad är ett minoritetsspråk
- Plats för toran ark
- Treserva lathund