An Overlay Architecture for Pattern Matching Rasha Karakchi

  • Slides: 18
Download presentation
An Overlay Architecture for Pattern Matching Rasha Karakchi, Charles Danial, Jason Bakos Computer Science

An Overlay Architecture for Pattern Matching Rasha Karakchi, Charles Danial, Jason Bakos Computer Science and Engineering 1

Pattern Matching Patterns: ex. threat signatures, network addresses, genomic seq. preprocess (slow) input sequence

Pattern Matching Patterns: ex. threat signatures, network addresses, genomic seq. preprocess (slow) input sequence TCAM Bloom filter Automata pattern matches (fast) 2

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input Active States 0 0 a

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input Active States 0 0 a 1 b 2 a 3 b 4 c 5 3

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “a” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “a” 0 a 1 Active States a b 2 a 3 b 4 c 0, 1 5 tracking pattern “a” 4

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ab” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ab” 0 a 1 Active States b 2 a 3 b 4 c a 0, 1 b 0, 2 5 tracking pattern “ab” 5

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “aba” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “aba” 0 a 1 b tracking pattern “a” Active States 2 a 3 b 4 c 5 a 0, 1 b 0, 2 a 0, 1, 3 tracking pattern “aba” 6

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abab” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abab” 0 a 1 b Active States 2 a tracking pattern “ab” 3 b 4 c 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 tracking pattern “abab” 7

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababa” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababa” 0 a 1 b Active States 2 a 3 b tracking pattern “aba” 4 c lost track 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 8

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababab” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “ababab” 0 a 1 b 2 Active States a 3 b 4 c tracking pattern “abab” 5 a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 b 0, 4 9

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abababc” 0

Nondeterministic Finite Automata (NFA) • Recognize pattern: “ababc” Input 0 • Input: “abababc” 0 a 1 b 2 Active States a 3 b 4 c 5 accept a 0, 1 b 0, 2 a 0, 1, 3 b 0, 2, 4 a 0, 3 b 0, 4 c 0, 5 (accept) 10

Applications q Hamming distance q Brill tagging rules q Protein Motif signatures q Sequential

Applications q Hamming distance q Brill tagging rules q Protein Motif signatures q Sequential Pattern Mining 11

State Element (SE) a b Other SEs a c c b To output priority

State Element (SE) a b Other SEs a c c b To output priority encoders Shift in 256 x 1 RAM Input buffer 64 Kx 8 start … counter a b report Active (f = 6 here) Activation from predecessors n To successors Shift out 12

Runtime Behavior FIB = fill input buf. R = reconfigure IBF = input buf.

Runtime Behavior FIB = fill input buf. R = reconfigure IBF = input buf. flush OBF = output buf. flush 256 x 1 RAM Input buffer 64 Kx 8 start … counter Encoders Activation from predecessors time Output buffer report Active n To successors x (#states/#SEs) FIB R IBF OBF … x input_size/64 K R IBF OBF FIB R IBF …

Overlay Configurations Max. B/W* for 25% a. s. (GB/s) Encoders Max. Report cycles Max.

Overlay Configurations Max. B/W* for 25% a. s. (GB/s) Encoders Max. Report cycles Max. report rate (GHz) R (ms) Through. Put 24 K states (MB/s) Through. Put 128 K states (MB/s) SEs (K) f Fmax (MHz) 4 103 152 1866 16 100% 2. 4 21 14 3 8 44 136 1427 32 50% 2. 2 31 27 5 12 25 122 1091 48 33% 2. 0 43 32 6 16 12 121 692 64 25% 1. 9 53 36 9 20 6 119 426 80 20% 1. 9 67 31 9 24 3 112 240 96 17% 1. 8 74 67 11 14

Physical Mapping • Assume f = 9 – [n-4 to n+4] • SE Mapping:

Physical Mapping • Assume f = 9 – [n-4 to n+4] • SE Mapping: • 7! = 5040 possible mappings f valid mappings State SE 5 0 A 0 B 1 6 24 (0. 5%) C 2 7 48 (1%) D 3 8 372 (7%) E 4 F 5 G 6 15

Performance Results Benchmark # states Minimum Hardware Fan-out Achieved Brill 26668 Clam. AV Overlay

Performance Results Benchmark # states Minimum Hardware Fan-out Achieved Brill 26668 Clam. AV Overlay #R/ buffer Through put (MB/s) 40 8 K 4 20 49538 18 12 K 5 13 Levenshtein 2784 17 12 K 1 63 Hamming 11346 21 12 K 1 63 SPM 100500 8 16 K 5 10 Entity. Resolution 95136 62 4 K 19 3 Random. Forest 75340 12 16 K 5 15 Power. EN 40513 29 8 K 5 16 Snort 69029 60 4 K 17 5 Fermi 40783 8 16 K 2 24 Protomata 42061 42 8 K 6 13 Dot. Star 96438 4 20 K 5 12 16 * i. NFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i 5 -4440@3. 1 GHz

Performance Results Benchmark # states Minimum Through Hardware put Fan-out (MB/s) Achieved Brill 26668

Performance Results Benchmark # states Minimum Through Hardware put Fan-out (MB/s) Achieved Brill 26668 20 40 Clam. AV 49538 13 18 Levenshtein 2784 Hamming Through Ave. i. NFAnt put #R/ Hyperscan act. (GPU) * (CPU) ** Overlay buffer (MB/s) states 14 8 K Speedup 7 4 120 3 4 12 K 4 5 1413 0. 9 63 17 88 12 K 38 1 163 1. 7 11346 63 21 24012 K 18 1 1063 3. 5 SPM 100500 10 8 633120 K 0. 5 5 0. 1 11 20 Entity. Resolution 95136 3 62 10 4 K 4 19 13 0. 8 Random. Forest 75340 15 12 96816 K 2 5 0. 5 15 7. 5 Power. EN 40513 16 29 31 8 K 53 5 1016 0. 3 Snort 69029 5 60 98 4 K 14 17 0. 45 0. 4 Fermi 40783 24 8 385420 K 2 2 121 12 Protomata 42061 13 42 19 8 K 5 6 113 2. 6 Dot. Star 96438 12 4 40 5 1012 0. 3 3 20 K 17 * i. NFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i 5 -4440@3. 1 GHz

Current/Future Work • Hide input buffer flush latency with output buffer flush • SAT-solver

Current/Future Work • Hide input buffer flush latency with output buffer flush • SAT-solver based mapping algorithm • Scale up to larger FPGAs and faster memory (DDR 4/HBM 2)