An Overlay Architecture for Pattern Matching Rasha Karakchi














![Physical Mapping • Assume f = 9 – [n-4 to n+4] • SE Mapping: Physical Mapping • Assume f = 9 – [n-4 to n+4] • SE Mapping:](https://slidetodoc.com/presentation_image_h2/67cc225bb8d975e4944b87b36366ea90/image-15.jpg)



- Slides: 18
An Overlay Architecture for Pattern Matching Rasha Karakchi, Charles Danial, Jason Bakos Computer Science and Engineering 1
Pattern Matching Patterns: ex. threat signatures, network addresses, genomic seq. preprocess (slow) input sequence TCAM Bloom filter Automata pattern matches (fast) 2
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input Active States 0 0 a 1 b 2 a 3 b 4 c 5 3
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “a” 0 a 1 Active States a b 2 a 3 b 4 c 0, 1 5 tracking pattern “a” 4
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ab” 0 a 1 Active States b 2 a 3 b 4 c a 0, 1 b 0, 2 5 tracking pattern “ab” 5
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “aba” 0 a 1 b tracking pattern “a” Active States 2 a 3 b 4 c 5 a 0, 1 b 0, 2 a 0, 1, 3 tracking pattern “aba” 6
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abab” 0 a 1 b Active States 2 a tracking pattern “ab” 3 b 4 c 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 tracking pattern “abab” 7
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababa” 0 a 1 b Active States 2 a 3 b tracking pattern “aba” 4 c lost track 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 8
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababab” 0 a 1 b 2 Active States a 3 b 4 c tracking pattern “abab” 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 b 0, 4 9
Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abababc” 0 a 1 b 2 Active States a 3 b 4 c 5 accept a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 b 0, 4 c 0, 5 (accept) 10
Applications q Hamming distance q Brill tagging rules q Protein Motif signatures q Sequential Pattern Mining 11
State Element (SE) a b Other SEs a c c b To output priority encoders Shift in 256 x 1 RAM Input buffer 64 Kx 8 start … counter a b report Active (f = 6 here) Activation from predecessors n To successors Shift out 12
Runtime Behavior FIB = fill input buf. R = reconfigure IBF = input buf. flush OBF = output buf. flush 256 x 1 RAM Input buffer 64 Kx 8 start … counter Encoders Activation from predecessors time Output buffer report Active n To successors x (#states/#SEs) FIB R IBF OBF … x input_size/64 K R IBF OBF FIB R IBF …
Overlay Configurations Max. B/W* for 25% a. s. (GB/s) Encoders Max. Report cycles Max. report rate (GHz) R (ms) Through. Put 24 K states (MB/s) Through. Put 128 K states (MB/s) SEs (K) f Fmax (MHz) 4 103 152 1866 16 100% 2. 4 21 14 3 8 44 136 1427 32 50% 2. 2 31 27 5 12 25 122 1091 48 33% 2. 0 43 32 6 16 12 121 692 64 25% 1. 9 53 36 9 20 6 119 426 80 20% 1. 9 67 31 9 24 3 112 240 96 17% 1. 8 74 67 11 14
Physical Mapping • Assume f = 9 – [n-4 to n+4] • SE Mapping: • 7! = 5040 possible mappings f valid mappings State SE 5 0 A 0 B 1 6 24 (0. 5%) C 2 7 48 (1%) D 3 8 372 (7%) E 4 F 5 G 6 15
Performance Results Benchmark # states Minimum Hardware Fan-out Achieved Brill 26668 Clam. AV Overlay #R/ buffer Through put (MB/s) 40 8 K 4 20 49538 18 12 K 5 13 Levenshtein 2784 17 12 K 1 63 Hamming 11346 21 12 K 1 63 SPM 100500 8 16 K 5 10 Entity. Resolution 95136 62 4 K 19 3 Random. Forest 75340 12 16 K 5 15 Power. EN 40513 29 8 K 5 16 Snort 69029 60 4 K 17 5 Fermi 40783 8 16 K 2 24 Protomata 42061 42 8 K 6 13 Dot. Star 96438 4 20 K 5 12 16 * i. NFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i 5 -4440@3. 1 GHz
Performance Results Benchmark # states Minimum Through Hardware put Fan-out (MB/s) Achieved Brill 26668 20 40 Clam. AV 49538 13 18 Levenshtein 2784 Hamming Through Ave. i. NFAnt put #R/ Hyperscan act. (GPU) * (CPU) ** Overlay buffer (MB/s) states 14 8 K Speedup 7 4 120 3 4 12 K 4 5 1413 0. 9 63 17 88 12 K 38 1 163 1. 7 11346 63 21 24012 K 18 1 1063 3. 5 SPM 100500 10 8 633120 K 0. 5 5 0. 1 11 20 Entity. Resolution 95136 3 62 10 4 K 4 19 13 0. 8 Random. Forest 75340 15 12 96816 K 2 5 0. 5 15 7. 5 Power. EN 40513 16 29 31 8 K 53 5 1016 0. 3 Snort 69029 5 60 98 4 K 14 17 0. 45 0. 4 Fermi 40783 24 8 385420 K 2 2 121 12 Protomata 42061 13 42 19 8 K 5 6 113 2. 6 Dot. Star 96438 12 4 40 5 1012 0. 3 3 20 K 17 * i. NFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i 5 -4440@3. 1 GHz
Current/Future Work • Hide input buffer flush latency with output buffer flush • SAT-solver based mapping algorithm • Scale up to larger FPGAs and faster memory (DDR 4/HBM 2)