Highperformance packet classification algorithm for multithreaded IXP network

  • Slides: 11
Download presentation
High-performance packet classification algorithm for multithreaded IXP network processor Authors: Duo Liu, Zheng Chen,

High-performance packet classification algorithm for multithreaded IXP network processor Authors: Duo Liu, Zheng Chen, Bei Hua, Nenghai Yu, Xinan Tang Presenter: Fang-Chen, Kuo Date: 2008/07/09 Publisher/Conf. : ACM Transactions on Embedded Computing Systems (TECS) 2008 Dept. of Computer Science and Information Engineering National Cheng Kung University, Taiwan R. O. C. 1

Rule # F 1 F 2 F 3 Action R 1 001 010 011

Rule # F 1 F 2 F 3 Action R 1 001 010 011 Permit R 2 001 100 011 Deny R 3 01* 100 *** Permit R 4 *** *** Permit 2

Classification Scheme 1. 2. Apply a compression technique to RFC’s cross-producting tables to reduce

Classification Scheme 1. 2. Apply a compression technique to RFC’s cross-producting tables to reduce the data redundancies; Exploit the NPU architectural features to achieve high classification speed, especially at 10 Gbps or higher on Intel IXP 2800. 3

Reduction Tree (1) n Phase 0 contains 6 chunks: n n Phase 1 contains

Reduction Tree (1) n Phase 0 contains 6 chunks: n n Phase 1 contains 2 chunks: n n n chunk 0 uses the high 16 bits of source IP address, chunk 1 uses the low 16 bits of source IP address, chunk 2 uses the high 16 bits of destination IP address, chunk 3 uses the low 16 bits of destination IP address, chunk 4 uses source port, and chunk 5 uses destination port; Chunk 0 (CPT X) of phase 1 is formed by combining chunk 0, 1, 4 of phase 0; Chunk 1 (CPT Y) of phase 1 formed by combining chunk 2, 3, 5 of phase 0; Phase 2 contains 1 chunk: n Chunk 0 (CPT Z) of phase 2 is formed by combining the two chunks of phase 1. 4

Reduction Tree (2) Phase_0 Phase_1 Phase_2 SA_H 16 SA_L 16 CPT-X SP CPT_Z DA_H

Reduction Tree (2) Phase_0 Phase_1 Phase_2 SA_H 16 SA_L 16 CPT-X SP CPT_Z DA_H 16 DA_L 16 CPT_Y DP 5

Bitmap-RFC 6

Bitmap-RFC 6

Data structure for Bitmap-RFC algorithm 7

Data structure for Bitmap-RFC algorithm 7

Implementation Issues Memory Space Reduction Instruction Selection 1. 2. n 3. 4. 5. 6.

Implementation Issues Memory Space Reduction Instruction Selection 1. 2. n 3. 4. 5. 6. POP_COUNT Multiplication Elimination Data Allocation Task partitioning Latency Hiding 8

Experiment: Memory Requirement Rule # Memory Requirement of CPT and CCPT (MB) Total Memory

Experiment: Memory Requirement Rule # Memory Requirement of CPT and CCPT (MB) Total Memory Requirement (MB) X RFC X’ Y B. -RFC Y’ Z B. -RFC Z’ RFC B. -RFC Bitmap -RFC 5700 59. 2 18. 5 25. 8 8. 1 64. 1 16. 0 149. 9 43. 4 8050 80. 7 25. 2 64. 3 20. 1 127. 7 39. 9 273. 5 86. 0 12 K 106. 5 33. 3 106. 3 33. 2 284. 9 71. 2 498. 6 138. 5 17 K 191. 0 59. 7 185. 0 57. 8 570. 7 178. 4 947. 6 296. 7 9

Experiment: Classifying Rates (1) Classifiers 1000 2000 3000 RFC Bitmap-RFC ME # Rates 3

Experiment: Classifying Rates (1) Classifiers 1000 2000 3000 RFC Bitmap-RFC ME # Rates 3 27. 17 3 26. 65 3 26. 11 3 25. 74 3 26. 58 3 25. 77 10

Experiment: Classifying Rates (2) 1000 2000 3000 1 ME 6. 54 6. 38 2

Experiment: Classifying Rates (2) 1000 2000 3000 1 ME 6. 54 6. 38 2 ME 12. 85 12. 77 4 ME 25. 65 25. 09 8 ME 33. 35 33. 33 11