Effi Cuts Optimizing Packet Classification for Memory and

Packet Classification � Packet Classification: find the highest priority rule that matches a network

Scalability of Packet Classification � Line rates are increasing 40 Gbps now, 160 Gbps

Previous Approaches � Well-studied problem � TCAM: Brute-force search of all rules Provides deterministic

Existing Decision Tree Algorithms � Hi. Cuts [HOTI `99] � Hyper. Cuts [SIGCOMM `03]

Our Contributions � Effi. Cuts reduces memory over Hyper. Cuts while achieving high packet

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion

Background: Decision Trees � Rules are hypercubes in rule space R 1 � Builds

Key Observation #1: Variation in Size � Hyper. Cuts’ memory overhead is due to

Key Observation #2: Variation in Density � Hyper. Cuts’ memory overhead is due to

Our Four Techniques � Tackle variation in rule size Separable trees – significantly reduces

Effi. Cuts Technique #1: Separable Trees � Recall: fine cuts to separate small rules

Separable Trees (contd. ) � A distinct tree for each set of separable rules

Effi. Cuts Technique #2: Selective Tree Merging � Each packet must traverse all trees

Effi. Cuts Technique #3: Equi-dense Cuts � Recall: Hyper. Cuts uses equi-sized cuts to

Equi-dense Cuts (contd. ) � Equi-dense cuts slightly increase lookup complexity over equi-size cuts

Effi. Cuts Technique #4: Node Co-location � We co-locate a node and its children

Experimental Methodology � Hi. Cuts, Hyper. Cuts with all heuristics and Effi. Cuts All

Important Metrics � Memory size ≈ cost � Memory accesses ≈ 1/packet throughput Recall:

Memory Size (bytes per rule) Hicuts Bytes per Rule 1, 000 Hypercuts Efficuts 100,

Memory Accesses 100 Hicuts Hypercuts Efficuts 80 60 40 20 0 1 K 10

Dealing with Increased Memory Accesses � Recall: More accesses means lower packet throughput �

Power (W) Per Copy Throughput (106 packets/s) Memory (MB) Additional Copies Power (W) Effi.

Per Copy Throughput (106 packet/s) Additional Copies Power (W) Effi. Cuts Power (W) TCAM

Conclusion � Effi. Cuts nearly eliminates rule replication; reduces memory overhead drastically � Four

Slides: 26

Download presentation

Effi. Cuts: Optimizing Packet Classification for Memory and Throughput Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010

Packet Classification � Packet Classification: find the highest priority rule that matches a network packet � Classifier: a set of rules Source IP Destination IP Source Port 120. 0/24 198. 12. 130. 0/2 0: 65535 138. 42. 83. 1/0 174. 3. 18. 0/8 Destination Protocol Port Action 11: 17 Accept 50: 10000 0: 65535 0 x. FF/0 x. FF 0 x 06/0 x. FF Deny � Packet classification is key for Security Traffic monitoring and analysis Qo. S Packet classification prevalent in modern routers 2

Scalability of Packet Classification � Line rates are increasing 40 Gbps now, 160 Gbps in future � Classifier size (number of rules) is increasing VPNs Finer-grained traffic differentiation IPv 6 � Power is increasing (process more packets per second and search more rules per packet) 30 W (30 % of router power) Must scale well in throughput, cost, and power 3

Previous Approaches � Well-studied problem � TCAM: Brute-force search of all rules Provides deterministic search time Scales poorly in cost and power with classifier size ▪ 10 X more expensive in cost than SRAM ▪ Tight power budget for router line cards � Algorithmic approaches: Prune search of rules E. g. bit vector, cross-producting, tuple search, decision tree Decision tree based algorithms (RAM based) ▪ One of the more effective approaches All potentially scalable but have problems Address scalability of decision-tree algorithms 4

Existing Decision Tree Algorithms � Hi. Cuts [HOTI `99] � Hyper. Cuts [SIGCOMM `03] Improves upon Hi. Cuts in both memory and throughput Most effective decision tree algorithm � Despite optimizations, Hyper. Cuts need large memory Rules get replicated multiple times; consume memory Replicate each rule by factors of 2, 000 to 10, 000 on average Rule replication large memories cost and power 5

Our Contributions � Effi. Cuts reduces memory over Hyper. Cuts while achieving high packet throughput Nearly eliminates rule replication Employs four new techniques � For similar throughput (OC-768), Effi. Cuts Reduces memory by 57 X and power by 8 X over Hyper. Cuts Consumes 6 X less power than TCAM Effi. Cuts enables decision tree approaches to be more scalable in throughput, cost, and power 6

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion 7

Background: Decision Trees � Rules are hypercubes in rule space R 1 � Builds a tree by successively R 6 cutting rule space to separate rules into smaller Y sub-spaces (child nodes) R 5 R 4 X � Stop when a small number of rules at a node � Many heuristics/ optimizations � Packets traverse tree during classification R 3 R 2 Root Node R 1, R 2, R 5 R 3, R 5 R 6 R 4 8

Key Observation #1: Variation in Size � Hyper. Cuts’ memory overhead is due to (1): Variation in rule size replicated rules ▪ Many rules overlap, overlapping rules vary vastly in size ▪ Fine cuts to separate small rules cut & replicate large rules A K F B I D C G H J L E 9

Key Observation #2: Variation in Density � Hyper. Cuts’ memory overhead is due to (2): Variation in rule-space density ineffectual nodes ▪ Fine, equi-sized cuts to separate densely-clustered rules create many ineffectual nodes in nearby, sparse areas ▪ Nearly-empty nodes or nodes with replicated rules Y X D Z A B E F C G 10

Our Four Techniques � Tackle variation in rule size Separable trees – significantly reduces memory (rule replication) but modestly degrades throughput ▪ Selective tree merging – recovers some throughput � Tackle variation in rule-space density Equi-dense cuts – further reduces memory (ineffectual nodes) ▪ Node co-location – further improves throughput 11

Effi. Cuts Technique #1: Separable Trees � Recall: fine cuts to separate small rules replicate large rules A E D Y B F X C � Distinct trees for small & large rules Separating small & large not enough Small/large matters per-dimension � Separable Subsets: Subset of rules that are either small or large in each dimension ({A, B, C}, {D}, {E, F}) E. g. , large wildcards, small non-wildcard 12

Separable Trees (contd. ) � A distinct tree for each set of separable rules in 5 IP fields Rules with four large fields (max 5 C 4 trees) Rules with three large fields (max 5 C 3 trees) Rules with two large fields (max 5 C 2 trees) and so on In theory 25 – 1 = 31 trees ▪ In practice ~12 trees (some sets empty) 13

Effi. Cuts Technique #2: Selective Tree Merging � Each packet must traverse all trees � Multiple trees many memory accesses per packet Eat up memory bandwidth decrease packet throughput � So, to reduce accesses merge some trees Merged tree’s depth < sum of depths of unmerged trees � Control rule replication merge trees mixing rules that are small or large in at most one dimension Tree 1 * * Tree 2 * * * Reduce accesses (improves throughput) by 30% over no merging 14

Effi. Cuts Technique #3: Equi-dense Cuts � Recall: Hyper. Cuts uses equi-sized cuts to separate dense areas – create ineffectual nodes in nearby, sparse areas Nearly-empty nodes or nodes with replicated rules Y X D Z A B E F C G � Equi-dense Cuts: Unequal cuts to distribute rules evenly among fewer children by fusing adjacent equi-sized cuts Fine/coarse cuts in dense/sparse areas 15

Equi-dense Cuts (contd. ) � Equi-dense cuts slightly increase lookup complexity over equi-size cuts We can handle this, details in the paper � Fusion heuristics to create equi-dense cuts Details in the paper Equi-dense cuts reduce memory by 40% over equi-sized cuts 16

Effi. Cuts Technique #4: Node Co-location � We co-locate a node and its children Reduces two memory accesses per node to one Details in the paper Reduces total per-packet memory accesses (improves throughput) by 50% over no co-location 17

Outline � Introduction � Background � Effi. Cuts Insights Techniques � Results � Conclusion 18

Experimental Methodology � Hi. Cuts, Hyper. Cuts with all heuristics and Effi. Cuts All use 16 rules per leaf Effi. Cuts’ numbers include all its trees � Memory access width in bytes Hi. Cuts – 13, Hyper. Cuts & Effi. Cuts – 22 � Class. Bench classifiers 3 types (ACL, FW, IPC) and 3 sizes (1 K, 100 K rules) 36 classifiers overall but present 9 typical cases here � Power estimation HP Labs Cacti 6. 5 to model SRAM/TCAM power and cycle time 19

Important Metrics � Memory size ≈ cost � Memory accesses ≈ 1/packet throughput Recall: More accesses consume memory bandwidth � Memory size & accesses impact power 20

Memory Size (bytes per rule) Hicuts Bytes per Rule 1, 000 Hypercuts Efficuts 100, 000 1, 000 10 1 K 10 K ACL 100 K 1 K 10 K FW 100 K 1 K 10 K IPC 100 K � Hyper. Cuts’ (& Hi. Cuts’) memory grow more rapidly than Effi. Cuts’ � Effi. Cuts reduces replication from 1000 s to less than 9 Effi. Cuts needs constant bytes/rule for all sizes linear growth 57 x less memory than Hyper. Cuts 21

Memory Accesses 100 Hicuts Hypercuts Efficuts 80 60 40 20 0 1 K 10 K 100 K 1 K 100 K IPC FW ACL � Effi. Cuts requires 50% more memory accesses on average than Hyper. Cuts Effi. Cuts modestly increases memory accesses while significantly reducing memory 22

Dealing with Increased Memory Accesses � Recall: More accesses means lower packet throughput � Absorb more accesses via extra memory copies � Effi. Cuts’ much smaller memory copies are inexpensive 23

Power (W) Per Copy Throughput (106 packets/s) Memory (MB) Additional Copies Power (W) Effi. Cuts Memory (MB) Hyper. Cuts Throughput (106 packets/s) Classifier Type Overall : Effi. Cuts vs Hyper. Cuts (100 k rules) ACL 149 1084 31 73 5. 33 1 6 FW 101 2433 40 95 3. 70 1 4 IPC 248 575 26 318 5. 49 0 3 � Hyper. Cuts: Fewer accesses, large memory (high power) � Effi. Cuts: More accesses, small memory (low power) One additional copy to match Hyper. Cuts packet throughput Effi. Cuts: 50% more accesses, 57 X less memory, 8 X less power 24

Per Copy Throughput (106 packet/s) Additional Copies Power (W) Effi. Cuts Power (W) TCAM Throughput (106 packet/s) Classifier Type Overall : Effi. Cuts vs TCAM (100 k rules) ACL 134 23 66 1 6 FW 134 23 95 1 4 IPC 134 23 318 0 3 � TCAM: one access per packet, but high power + slow cycle time � Effi. Cuts : low power + fast cycle time but many accesses One additional copy to match TCAM packet throughput Effi. Cuts achieves power reduction of 6 X over TCAM 25

Conclusion � Effi. Cuts nearly eliminates rule replication; reduces memory overhead drastically � Four techniques: separable trees, selective tree merging, Equi-dense cuts, node co-location � Compared to Hyper. Cuts, for similar throughput, Effi. Cuts: Reduces rule replication from factor of 1000 s to less than 9 Reduces memory overhead by 57 X Reduces power by 8 X � Compared to TCAM, for similar throughput, Effi. Cuts: Reduces power by 6 X and cost by 10 X Effi. Cuts greatly lowers the barrier for adoption of decision-tree-based packet classification 26