Effi Cuts Optimizing Packet Classification for Memory and

  • Slides: 26
Download presentation
Effi. Cuts: Optimizing Packet Classification for Memory and Throughput Balajee Vamanan, Gwendolyn Voskuilen, and

Effi. Cuts: Optimizing Packet Classification for Memory and Throughput Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010

Packet Classification � Packet Classification: find the highest priority rule that matches a network

Packet Classification � Packet Classification: find the highest priority rule that matches a network packet � Classifier: a set of rules Source IP Destination IP Source Port 120. 0/24 198. 12. 130. 0/2 0: 65535 138. 42. 83. 1/0 174. 3. 18. 0/8 Destination Protocol Port Action 11: 17 Accept 50: 10000 0: 65535 0 x. FF/0 x. FF 0 x 06/0 x. FF Deny � Packet classification is key for Security Traffic monitoring and analysis Qo. S Packet classification prevalent in modern routers 2

Scalability of Packet Classification � Line rates are increasing 40 Gbps now, 160 Gbps

Scalability of Packet Classification � Line rates are increasing 40 Gbps now, 160 Gbps in future � Classifier size (number of rules) is increasing VPNs Finer-grained traffic differentiation IPv 6 � Power is increasing (process more packets per second and search more rules per packet) 30 W (30 % of router power) Must scale well in throughput, cost, and power 3

Previous Approaches � Well-studied problem � TCAM: Brute-force search of all rules Provides deterministic

Previous Approaches � Well-studied problem � TCAM: Brute-force search of all rules Provides deterministic search time Scales poorly in cost and power with classifier size ▪ 10 X more expensive in cost than SRAM ▪ Tight power budget for router line cards � Algorithmic approaches: Prune search of rules E. g. bit vector, cross-producting, tuple search, decision tree Decision tree based algorithms (RAM based) ▪ One of the more effective approaches All potentially scalable but have problems Address scalability of decision-tree algorithms 4

Existing Decision Tree Algorithms � Hi. Cuts [HOTI `99] � Hyper. Cuts [SIGCOMM `03]

Existing Decision Tree Algorithms � Hi. Cuts [HOTI `99] � Hyper. Cuts [SIGCOMM `03] Improves upon Hi. Cuts in both memory and throughput Most effective decision tree algorithm � Despite optimizations, Hyper. Cuts need large memory Rules get replicated multiple times; consume memory Replicate each rule by factors of 2, 000 to 10, 000 on average Rule replication large memories cost and power 5

Our Contributions � Effi. Cuts reduces memory over Hyper. Cuts while achieving high packet

Our Contributions � Effi. Cuts reduces memory over Hyper. Cuts while achieving high packet throughput Nearly eliminates rule replication Employs four new techniques � For similar throughput (OC-768), Effi. Cuts Reduces memory by 57 X and power by 8 X over Hyper. Cuts Consumes 6 X less power than TCAM Effi. Cuts enables decision tree approaches to be more scalable in throughput, cost, and power 6

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion 7

Background: Decision Trees � Rules are hypercubes in rule space R 1 � Builds

Background: Decision Trees � Rules are hypercubes in rule space R 1 � Builds a tree by successively R 6 cutting rule space to separate rules into smaller Y sub-spaces (child nodes) R 5 R 4 X � Stop when a small number of rules at a node � Many heuristics/ optimizations � Packets traverse tree during classification R 3 R 2 Root Node R 1, R 2, R 5 R 3, R 5 R 6 R 4 8

Key Observation #1: Variation in Size � Hyper. Cuts’ memory overhead is due to

Key Observation #1: Variation in Size � Hyper. Cuts’ memory overhead is due to (1): Variation in rule size replicated rules ▪ Many rules overlap, overlapping rules vary vastly in size ▪ Fine cuts to separate small rules cut & replicate large rules A K F B I D C G H J L E 9

Key Observation #2: Variation in Density � Hyper. Cuts’ memory overhead is due to

Key Observation #2: Variation in Density � Hyper. Cuts’ memory overhead is due to (2): Variation in rule-space density ineffectual nodes ▪ Fine, equi-sized cuts to separate densely-clustered rules create many ineffectual nodes in nearby, sparse areas ▪ Nearly-empty nodes or nodes with replicated rules Y X D Z A B E F C G 10

Our Four Techniques � Tackle variation in rule size Separable trees – significantly reduces

Our Four Techniques � Tackle variation in rule size Separable trees – significantly reduces memory (rule replication) but modestly degrades throughput ▪ Selective tree merging – recovers some throughput � Tackle variation in rule-space density Equi-dense cuts – further reduces memory (ineffectual nodes) ▪ Node co-location – further improves throughput 11

Effi. Cuts Technique #1: Separable Trees � Recall: fine cuts to separate small rules

Effi. Cuts Technique #1: Separable Trees � Recall: fine cuts to separate small rules replicate large rules A E D Y B F X C � Distinct trees for small & large rules Separating small & large not enough Small/large matters per-dimension � Separable Subsets: Subset of rules that are either small or large in each dimension ({A, B, C}, {D}, {E, F}) E. g. , large wildcards, small non-wildcard 12

Separable Trees (contd. ) � A distinct tree for each set of separable rules

Separable Trees (contd. ) � A distinct tree for each set of separable rules in 5 IP fields Rules with four large fields (max 5 C 4 trees) Rules with three large fields (max 5 C 3 trees) Rules with two large fields (max 5 C 2 trees) and so on In theory 25 – 1 = 31 trees ▪ In practice ~12 trees (some sets empty) 13

Effi. Cuts Technique #2: Selective Tree Merging � Each packet must traverse all trees

Effi. Cuts Technique #2: Selective Tree Merging � Each packet must traverse all trees � Multiple trees many memory accesses per packet Eat up memory bandwidth decrease packet throughput � So, to reduce accesses merge some trees Merged tree’s depth < sum of depths of unmerged trees � Control rule replication merge trees mixing rules that are small or large in at most one dimension Tree 1 * * Tree 2 * * * Reduce accesses (improves throughput) by 30% over no merging 14

Effi. Cuts Technique #3: Equi-dense Cuts � Recall: Hyper. Cuts uses equi-sized cuts to

Effi. Cuts Technique #3: Equi-dense Cuts � Recall: Hyper. Cuts uses equi-sized cuts to separate dense areas – create ineffectual nodes in nearby, sparse areas Nearly-empty nodes or nodes with replicated rules Y X D Z A B E F C G � Equi-dense Cuts: Unequal cuts to distribute rules evenly among fewer children by fusing adjacent equi-sized cuts Fine/coarse cuts in dense/sparse areas 15

Equi-dense Cuts (contd. ) � Equi-dense cuts slightly increase lookup complexity over equi-size cuts

Equi-dense Cuts (contd. ) � Equi-dense cuts slightly increase lookup complexity over equi-size cuts We can handle this, details in the paper � Fusion heuristics to create equi-dense cuts Details in the paper Equi-dense cuts reduce memory by 40% over equi-sized cuts 16

Effi. Cuts Technique #4: Node Co-location � We co-locate a node and its children

Effi. Cuts Technique #4: Node Co-location � We co-locate a node and its children Reduces two memory accesses per node to one Details in the paper Reduces total per-packet memory accesses (improves throughput) by 50% over no co-location 17

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion 18

Experimental Methodology � Hi. Cuts, Hyper. Cuts with all heuristics and Effi. Cuts All

Experimental Methodology � Hi. Cuts, Hyper. Cuts with all heuristics and Effi. Cuts All use 16 rules per leaf Effi. Cuts’ numbers include all its trees � Memory access width in bytes Hi. Cuts – 13, Hyper. Cuts & Effi. Cuts – 22 � Class. Bench classifiers 3 types (ACL, FW, IPC) and 3 sizes (1 K, 100 K rules) 36 classifiers overall but present 9 typical cases here � Power estimation HP Labs Cacti 6. 5 to model SRAM/TCAM power and cycle time 19

Important Metrics � Memory size ≈ cost � Memory accesses ≈ 1/packet throughput Recall:

Important Metrics � Memory size ≈ cost � Memory accesses ≈ 1/packet throughput Recall: More accesses consume memory bandwidth � Memory size & accesses impact power 20

Memory Size (bytes per rule) Hicuts Bytes per Rule 1, 000 Hypercuts Efficuts 100,

Memory Size (bytes per rule) Hicuts Bytes per Rule 1, 000 Hypercuts Efficuts 100, 000 1, 000 10 1 K 10 K ACL 100 K 1 K 10 K FW 100 K 1 K 10 K IPC 100 K � Hyper. Cuts’ (& Hi. Cuts’) memory grow more rapidly than Effi. Cuts’ � Effi. Cuts reduces replication from 1000 s to less than 9 Effi. Cuts needs constant bytes/rule for all sizes linear growth 57 x less memory than Hyper. Cuts 21

Memory Accesses 100 Hicuts Hypercuts Efficuts 80 60 40 20 0 1 K 10

Memory Accesses 100 Hicuts Hypercuts Efficuts 80 60 40 20 0 1 K 10 K 100 K 1 K 100 K IPC FW ACL � Effi. Cuts requires 50% more memory accesses on average than Hyper. Cuts Effi. Cuts modestly increases memory accesses while significantly reducing memory 22

Dealing with Increased Memory Accesses � Recall: More accesses means lower packet throughput �

Dealing with Increased Memory Accesses � Recall: More accesses means lower packet throughput � Absorb more accesses via extra memory copies � Effi. Cuts’ much smaller memory copies are inexpensive 23

Power (W) Per Copy Throughput (106 packets/s) Memory (MB) Additional Copies Power (W) Effi.

Power (W) Per Copy Throughput (106 packets/s) Memory (MB) Additional Copies Power (W) Effi. Cuts Memory (MB) Hyper. Cuts Throughput (106 packets/s) Classifier Type Overall : Effi. Cuts vs Hyper. Cuts (100 k rules) ACL 149 1084 31 73 5. 33 1 6 FW 101 2433 40 95 3. 70 1 4 IPC 248 575 26 318 5. 49 0 3 � Hyper. Cuts: Fewer accesses, large memory (high power) � Effi. Cuts: More accesses, small memory (low power) One additional copy to match Hyper. Cuts packet throughput Effi. Cuts: 50% more accesses, 57 X less memory, 8 X less power 24

Per Copy Throughput (106 packet/s) Additional Copies Power (W) Effi. Cuts Power (W) TCAM

Per Copy Throughput (106 packet/s) Additional Copies Power (W) Effi. Cuts Power (W) TCAM Throughput (106 packet/s) Classifier Type Overall : Effi. Cuts vs TCAM (100 k rules) ACL 134 23 66 1 6 FW 134 23 95 1 4 IPC 134 23 318 0 3 � TCAM: one access per packet, but high power + slow cycle time � Effi. Cuts : low power + fast cycle time but many accesses One additional copy to match TCAM packet throughput Effi. Cuts achieves power reduction of 6 X over TCAM 25

Conclusion � Effi. Cuts nearly eliminates rule replication; reduces memory overhead drastically � Four

Conclusion � Effi. Cuts nearly eliminates rule replication; reduces memory overhead drastically � Four techniques: separable trees, selective tree merging, Equi-dense cuts, node co-location � Compared to Hyper. Cuts, for similar throughput, Effi. Cuts: Reduces rule replication from factor of 1000 s to less than 9 Reduces memory overhead by 57 X Reduces power by 8 X � Compared to TCAM, for similar throughput, Effi. Cuts: Reduces power by 6 X and cost by 10 X Effi. Cuts greatly lowers the barrier for adoption of decision-tree-based packet classification 26