A Resource Efficient Content Inspection System for Next

























- Slides: 25
A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet Project High-performance Computing & Simulation Research Lab ECE Department, University of Florida, Gainesville This work was supported by the U. S. National Science Foundation * Also affiliated with NSF Center for High Performance Reconfigurable Computing
Introduction INTERNET • Internet has grown at an alarming rate – 305% between 2000 and 2008 2 of 25
Introduction IDLE E IDL INTERNET IDLE • Edge devices are left idle 75% of the time with power management features disabled to maintain network connectivity. 3 of 25
Introduction z IDLE Z A solution to save power on the idle devices is power proxying z The idle PC is allowed to sleep The PC delegates responsibility to the NIC to handle network traffic Additionally, NICs can enhance network security through Network Intrusion Detection INTERNET 4 of 25
Introduction Next Generation Interfaces – Also known as Smart NICs are expected to take increased network responsibility Key Requirement – Packet Inspection Packet PAYLOAD Content Inspection HEADER Header Inspection This presentation focuses on Content Inspection. Content inspection is the process of searching the payload of the packet for the occurrence of known set of patterns called signatures. 5 of 25
Motivation Existing Methodologies Hardware Software Boyer-Moore Aho Corasick Wu Manber FPGAs TCAMs Bloom Filters Software techniques support high speed Auxiliarycannot data structures such as SRAM are links with large signature sets used to store pattern combinations to help FPGAs – Exploits Parallelism determine – Prohibitive price, area, and power for wide scale deployments a pattern match TCAMs – Popular Option – Performance O(1) – However, prohibitive energy, price, and auxiliary data structure requirements for existing implementations. Bloom Filters – Energy efficient and moderate throughput – False positives required further inspection on payload matching , imposes parallelism limits (scalability) 6 of 25
Background – TCAM Methodology Sample Signature: ABCD EFGH TCAM JKLM EFG* w=4 TCAM ABCDEFGHABCDJKLMEFG When w=4: ABCD EFGH ABCD JKLM EFG* Prefix Pattern Suffix Pattern TCAMs are attractive candidates for pattern matching due to their inherent simplicity in pattern matching , small look up time , high throughput, high density, and scalability. 7 of 25
Background – TCAM Methodology Proposed by Lakshman et. al A BACBADCBAEDCBFEDG CFED HGFEJHGKFJHLGKM JHLKM JELKFM ELGFM EUGFEIUGFIUG IU I ABCD EFGH JKLM EFG* information on Stores the valid combination Records the index of type the ofof all Auxiliary SRAM Structures matched pattern i. e, suffix prefix, suffix possible prefix and entries constructed prefix pattern Matched Index w=4 TCAM Combined Pattern Table Matching Table Partial Hit List Auxiliary SRAM structures contain several pattern permutations to identify valid patterns O(N 2) – Auxiliary SRAM structure space requirement. Gao et. al reduced this requirement to O(Nlog. N) by storing address permutations. 8 of 25
Proposed Solution TCAM Techniques are : ü Simplest and fastest technique - O(1) look up. üCan match future speed limits of 10 Gbps. ü Highly scalable with no parallelism limits. üCan accommodate signatures of varying length and different signature set sizes with ease However they suffer from : ü Increased energy consumption ü Prohibitive price ü Increased auxiliary data structure requirements Making them unsuitable for wide scale deployment in SNICs 9 of 25
Proposed Solution We propose a hybrid TCAM based solution Our Technique solves ü Energy efficiency – Through partitioned architecture ü Additional further reduction in power consumption through caching by exploiting network locality ü Auxiliary data structure requirement reduction using bloom filter or software techniques ü Meets throughput requirements of high speed links such as 1 Gbps/ 10 Gbps with ease ü More suitable for wide scale deployment due to high energy efficiency and reduced memory requirements. 10 of 25
Hybrid TCAM Methodology PTCAM STCAM APTCAM BCD w=4 ABCD EFGH JKLM EFG* EFGH A BCD STCAM JKLM EFG* w=4 TCAM Partition the single TCAM into a prefix TCAM (PTCAM) and a suffix TCAM (STCAM) Store signatures in the STCAM and PTCAM accordingly. The signature is then expressed as permutation of STCAM and PTCAM address. ABCDEFGHABCDJKLMEFG P 0 S 1 S 2 S 3 This permutation is then stored in bloom filter or in software 11 of 25
Exploiting Signature Locality Our experimentation indicates that there exists sufficient locality in network traces. To reduce unwanted switching we exploit this property and introduce a cache between the PTCAM and STCAM 12 of 25
Hybrid TCAM Methodology PTCAM STCAM Suffix A B C DCache w=4 $ E F G H Ctrl A B C D JKLM EFG* w=4 13 of 25
ABCDEFGHJKLMEFGUI PTCAM Suffix Cache ABCD $ Ctrl w=4 Enable Miss Hit Right Shift Activator Enable Buffer 1 0 0 th . . (w-1)th Hit Left Shift Hybrid TCAM Methodology EFGH ABCD JKLM EFG* w=4 Enabler Pause Payload is fed to the inspection system, shifted at the rate of 1 byte/clock The cache is activated (w-1) clock cycles after a TCAM hit A cache miss pauses shifting to allow searching the suffix TCAM for the pattern Cache controller ($ ctrl) updates suffix cache 14 of 25
ABCDEFGHJKLMEFGUI PTCAM Suffix Cache w=4 Enable Miss Hit Right Shift Activator Enable Buffer 1 0 0 th EFGH ABCD JKLM EFG* Hit w=4 Enabler . . (w-1)th Pause P 1 S 1 … S 1 11 01 00 ……… P 1 S 1 To Bloom Filter or Software unit to verify the combination Left Shift ABCD $ Ctrl Left Shift Hybrid TCAM Methodology 15 of 25
Left Shift Hybrid TCAM Methodology ABCDEFGHJKLMEFGUI PTCAM Suffix Cache ABCD $ Ctrl w=4 Enable Miss Hit Enable Buffer w=4 Enabler 1 0 0 th Hit . . (w-1)th Pause Hit Right Shift Activator EFGH ABCD JKLM EFG* Match Addr Contention Resolution P 1 S 1 … S 1 11 01 00 … 01 00 Left Shift Match Addr STCAM A contention resolution unit handles contention between identical PTCAM and STCAM patterns. Preference is given to PTCAM match over STCAM match 16 of 25
Experimental Setup Packet traces – Malicious traces from MIT – LL and capture the flag contest from DEFCON Festival No available power proxying traces and is an ongoing research C-based custom simulator written to behaviorally simulate the entire system. SNORT and Clam. AV used as signature sets Packets are reassembled and fed to the simulator STCAM accesses saved to analyze the effect of caching TCAM energy consumption obtained from Agarwal et. al TCAM modelling tool 17 of 25
Results – Signature Distribution Clam. AV and SNORT rule sets : SNORT smaller patterns (70% <= 4 bytes Clam. AV medium sized patterns (72% <30 bytes & >100 bytes) 18 of 25
Results Effect of partitioning on Size Partitioning circumvents natural TCAM compression. However, negligible increase in TCAM size. 19 of 25
Results EDP Reduction Partitioning reduces Energy-Delay Product (EDP). Two smaller TCAMs are faster than One single big TCAM. Higher EDP savings for widths of 8 and 16 bytes. 20 of 25
Results Energy Savings 1. Energy reduction for a partitioned system compared to a non-partitioned system verses TCAM width for real-time traffic traces. 2. Energy savings range from 6% to 69% (SNORT) and 6% to 87% (Clam. AV) 3. Smaller TCAMs widths give greater energy savings. 4. Larger TCAM accesses use more “don’t care” bits. 21 of 25
Results Effect of Caching – Hit rate 1. Caching on STCAM width of 4 bytes analyzed. 2. Hit rates range from 28% to 88% for cache sizes of only 40 to 60 entries 3. A cache containing 40 to 60 entries represents only 0. 002% to 0. 004%, respectively, of the S_TCAM entries 22 of 25
Results Effect of Caching – Energy Savings Energy savings for a partitioned TCAM system (w=4) with a suffix cache compared to a partitioned TCAM system with no suffix cache for varying number of cache entries. 13% to 64% additional Savings 23 of 25
Conclusion 1. Developed an energy efficient partitioned TCAM-based content inspection system for SNICs. 2. Energy and throughput aware 3. Energy Delay Product improvements of up to 62% compared to previous nonpartitioned TCAM systems. 4. Up to 87% energy savings (average) compared to a non-partitioned TCAM system. 5. A simple cache with a random replacement policy further reduces the energy consumption by 64% compared to a partitioned TCAM system. 6. Caching incurs a throughput reduction of 5. 5%. 24 of 25
Future Work 1. 2. 3. 4. Evaluating proposed bloom filter based architecture Improved caching techniques Attack robustness to counter maliciously engineered packets A pipelined architecture to hide cache misses and improve throughput. 25 of 25