Performance Evaluation of Packet Classication on FPGAbased TCAM
Performance Evaluation of Packet Classification on FPGA-based TCAM Emulation Architectures GLOBECOM (Global Communications Conference), 2012 Presenter: NTHU 101062607 李若萍
Outline • Introduction • Related Work • TCAM Emulation • RAM-based TCAM Architecture • Performance Evaluation • Conclusion 2/17
Introduction • Packet fields are used as keys to determine the best matching rule and apply a corresponding action. ▫ Exact matching ▫ Prefix matching ▫ Range matching • How to find the best matching rule? ▫ Each rule is assigned a cost. 3/17
Introduction (cont. ) TCAM(Ternary Content Addressable CAM(Content Addressable Memories) Data Key Mask. KEYMatch Actual Data VCCData line 0 0 0 X (don’t care) 0 Match line 0 1 1 1 0 00 0 X Match SRAM 0 0 0 cell 11 00 ≠ Data 1 1 1 0 1 0 X Match 1 1 = (key 0 ≠ Data) X Match line = !Match 0 1 1 0 Match line VCC 1 1 1 KEY Match line 1 1 0 1 1 SRAM cell Mask SRAM Data ≠ Match Mask & 0 Match = (key ≠ Data) & Mask Match line = !Match 4/17
Introduction (cont. ) • TCAMs (Ternary Content Addressable Memories) Compared key Memory address: 1 2 3 N store rules Compared result: 0 1 X Capacity constraints Storage inefficiency High power consumption Limited scalability TCAM 1 Priority Encoder memory address as index to find responding action RAM 5/17
Introduction (cont. ) • Purpose : we investigated performance and tradeoffs related to TCAM emulation in FPGAs (Field. Programmable Gate Array). (Not ASIC: Application-Specific Integrated Circuits) • We considered the impact of encoding different key ranges on rules for different configurations in terms of the search key length and the number of rules. 6/17
Related Work • Hardware-assisted packet classification ▫ Decision tree �Hierarchically split rule pattern straitens incremental updates. ▫ Decomposition �The cross-producting stage issue. ▫ Exhaustive search �Predictable memory requirements. 7/17
TCAM Emulation Native TCAM Emulated TCAM 8/17
RAM-based TCAM Architecture m-bit key (m = 10) Full address expansion native TCAM w=m 2 1 m-2 = 10 =8 9 10/2 5 RAM m/w = 1 10/9 10/8 10/1 RAM = 10 1 block RAMblock size = 2^w = 2^10 2^2 =( (0~2^8 -1 40~2^10 -1 2^9 2^8 2 ( 0~1 0~2^9 -1 )( 0~3 ) ) ) BRAMs demands (m/w) * 2^w bits BRAMs modes = depth*width 9/17
RAM-based TCAM Architecture (cont. ) • n = 64, m = 16 16 –bit key 2^16*64 m/w = 16/6 = 2 w=6 2^8*32*4 m/w = 16/6 = 2 RAM block size = 2^w = 2^6 = 64 10/17
RAM-based TCAM Architecture (cont. ) 11/17
Performance Evaluation emulated one (m/w)*(2^w)*6 • Resource Utilization ▫ A TCAM bit typically demands 16 transistors, while a RAM bit, only 6 ▫ TCAM => w*m*16 ▫ TCAM emulation => (m/w)*(2^w)*6 TCAM w*m*16 12/17
Performance Evaluation (cont. ) (m/w)*(2^w) bits 13/17
Performance Evaluation (cont. ) • Classification Throughput ▫ a crucial factor for evaluating emulated TCAM performance on FPGA is the actual classification throughput in terms of packets per second (pps). 14/17
Performance Evaluation (cont. ) • Range Impact ▫ we assess the impact of supporting different ranges in terms of memory requirements and classification rate. 15/17
Conclusion • Classification rates above 300 Mpps for both large keys and rule sets can be implemented with only a few megabits of RAM when considering up to medium size range intervals (512 -2048). • Support for both large ranges and large rule sets tends to demand much memory resources, which also penalizes the resulting classification rate. 16/17
Thank you! The End. 17/17
- Slides: 17