Reversible Sketches for Efficient and Accurate Change Detection
Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University 1
Online Change Detection • Network anomalies are common – Flash crowds, failures, Do. S, worms, … Online Detection over Data Streams • Data Stream: key/update pairs (k, u) –Heavy hitters (lots of prior work) –Heavy changes 2
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] -first to detect flow-level heavy changes in massive data streams at network traffic speeds. 0 1 … K-1 1 … j … H 3
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] Update (k, u): Tj [ hj(k)] += u (for all j) 0 1 … h 1(k) K-1 1 … hj(k) j … h. H(k) H Estimate v(S, k): sum of updates for key k 4
• Requires very little space: –E. g. 5 hash tables with 16 K buckets = 80 KB –Fits in high speed memory • Main problem – Cannot efficiently report keys with heavy change • Our Contribution – Determine set of keys that have “large” estimates in sketch ? ? 6
Reverse Sketch Problem “Heavy” 1 Input: -Sketch -Threshold 2 3 4 5 Output: Set of keys that hash to heavy buckets in majority (or all) hash tables 7
Outline value Streaming data recording key Modular hashing k-ary sketch IP mangling fast slow Heavy change detection change threshold k-ary sketch heavy change keys Improve Heavy Change Detection Reverse Hashing Algorithms 8
Taking Intersections • Intersect A 1, A 2, A 3, A 4, A 5 H=5 K = 212 #keys = 232 (IP addresses) E[false positives] << 1 9
The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! H=5 K = 212 #keys = 232 (IP addresses) |A 1| = 232 / 212 = 220 10
The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! • Solution: Modular hashing 11
Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits h() 12 bits 010 110 001 12
Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits h 1() 010 h 2() 110 h 3() 001 h 4() 101 010 110 001 101 Greatly reduces size of reverse mapped sets 13
Modular hashing reduces the set size 32 bits 10010100 28/23 = 25 10101011 10010101 10100011 8 bits h 1() 010 h 2() 110 h 3() 001 h 4() 101 010 110 001 101 Greatly reduces size of reverse mapped sets 14
Modular hashing reduces the set size Intersection: A 1: 2 5 * 2 5 Only 32 elements per partition 1 b 2 2 b 3 3 4 5 b 4 b 5 15
Modular hashing reduces the set size Intersection: Only 32 elements per partition 1 A 1: 2 5 * 2 5 A 2: 25 * 25 b 1 b 2 2 b 3 3 4 5 b 4 b 5 16
Handling Multiple Intersections… 2 H different intersections 1 b 1 2 3 b 2 b 3 4 5 b 2 b 3 b 4 b 5 Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report ) 17
Problem: Too many collisions 32 bits 129. 105. 56. 23 129. 105. 56. 28 129. 105. 56. 109 129. 105. 56. 35 129. 105. 56. 98. . . 12 bits 7. 4. 0. * 18
Problem: Too many collisions 32 bits 129. 105. 56. 23 129. 105. 56. 28 129. 105. 56. 109 129. 105. 56. 35 129. 105. 56. 98. . . 12 bits 7. 4. 0. * Solution: IP Mangling 19
IP-mangling 20
Invertible Modular Linear Equation f(x) a·x mod n To be invertible: Must be relatively prime • a is odd, chosen randomly 21
Modular Hashing Optimal Hashing 22
Modular Hashing with IP Mangling Optimal Hashing 23
Recap: value stored value Streaming data IP mangling recording key Heavy change detection change threshold reversible k-ary sketch Modular hashing Reverse hashing reversible k-ary sketch Reverse IP mangling heavy change keys 24
Evaluation • Traffic traces from Northwestern University edge router – Each 5 min interval average traffic 7. 5 GB in each interval • Compared with Ground Truth • 6 hash tables, 4 K buckets each, totally 192 KB memory • Up to 140 true heavy change keys in 1. 5 seconds – Over 95% TPP – Less than 2% FPP • All missing changes are due to boundary effects 25
Conclusions/ Future Work • Sketches: efficient summary structures • Our contribution: Reversible Sketches – efficient online detection of keys with heavy changes Work in Progress (see tech report) • Improved reverse hashing • Statistical guarantee on detection accuracy • More advanced applications: – Hierarchical change detection • E. g. 129. 105. 100. * shows a big change ! 26
Thank you ! See tech report for more! http: //list. cs. northwestern. edu 27
- Slides: 27