Layered Trees Most Specific Prefix based Pipelined Design
Layered. Trees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups 張燕光 資訊 程學系 Dept. of Computer Science & Information Engineering, 國立成功大學 National Cheng Kung University 1
Outline Introduction IP lookup review (1 -D packet classification) Data structures for IP lookups Binary prefix search Layered search trees Parallel and Pipelined search engine Conclusion 成功大學資訊 程系 CIAL 實驗室 2
Internet: Mesh of Routers The Internet Core Edge Router Campus Area Network 成功大學資訊 程系 CIAL 實驗室 3
RFC 1812: Requirements for IPv 4 Routers Must perform an IP datagram forwarding decision, called forwarding, routing lookup, IP lookup, longest prefix match Must send the datagram out to the appropriate interface (called switching) 成功大學資訊 程系 CIAL 實驗室 4
Router Design Model Slow-path: control plane RISC processor Ingress Receive Unit On-Chip SRAM Search Engine Transmit Egress Unit Fast-path: data plane 成功大學資訊 程系 CIAL 實驗室 5
Lookup in an IP Router H E A D E R Incoming Packet Dstn Addr Forwarding Engine Next Hop Computation Next Hop Forwarding Table Dstn-prefix Next Hop ------- Unicast destination address based lookup 成功大學資訊 程系 CIAL 實驗室 6
AS 6447 BGP Table Data last updated at Wed, 23 Nov 2011 15: 12: 48 GMT IPv 4 BGP Reports AS 131072 APNIC R&D 385, 044 AS 6447 Route-Views. Oregon-ix. net 396, 386 IPv 6 BGP Reports AS 131072 APNIC R&D 7, 616 AS 6447 Route-Views. Oregon-ix. net 7, 581 成功大學資訊 程系 CIAL 實驗室 7
Routing table example 1. 5. 0. 0/16 1. 9. 2. 0/24 1. 9. 4. 0/22 1. 9. 12. 0/24 1. 11. 0. 0/21 1. 11. 8. 0/21 1. 16. 0/21 1. 11. 24. 0/21 1. 11. 32. 0/21 1. 11. 40. 0/21 1. 11. 48. 0/21 1. 11. 56. 0/21 1. 11. 64. 0/21 1. 11. 72. 0/21 1. 11. 80. 0/21 1. 11. 88. 0/21 1. 128. 0/17 1. 12. 0. 0/14 1. 12. 0. 0/24 成功大學資訊 程系 1. 12. 1. 0/24 1. 21. 0. 0/16 1. 22. 0. 0/23 1. 22. 4. 0/23 1. 22. 6. 0/23 1. 22. 8. 0/23 1. 22. 12. 0/23 1. 22. 14. 0/23 1. 22. 16. 0/23 1. 22. 18. 0/23 CIAL 實驗室 8
AS 6447: UOREGON-IX - University of Oregon 成功大學資訊 程系 CIAL 實驗室 9
Router A memory hungry search application Router speed depends on the number of memory accesses for each lookup operation, i. e. , on the speed of memory IPv 6 is four times wider than IPv 4 addresses Negative Impact: 4 x number of memory accesses if CPU is 32 bits IPv 6 routers is four times slower than IPv 4 routers? Not really, but possible Pipeline design may be a good solution 成功大學資訊 程系 CIAL 實驗室 10
Example Forwarding Table P 1 P 2 P 3 P 4 Prefix 111* 1010* 10101 Next-hop H 1 H 2 H 3 H 4 Longest prefix match(LPM), not exact match Properties: prefixes are either disjoint or enclosing (one completely covers another) Prefix enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult. So, trie based schemes emerge naturally 成功大學資訊 程系 CIAL 實驗室 11
Basic Data Structures for IP lookups 成功大學資訊 程系 CIAL 實驗室 12
Prefix properties Disjoint prefixes: Two prefixes are said to be disjoint if they do not share any address. Prefix enclosure: A = bn-1…bj…bi* and B = bn-1…bj* and j > i. Prefix A is enclosed by B (B A) since the IP address space covered by A is a subset of that covered by B, where is the enclosure operator. A special case of overlapping. Prefix comparison The inequality 0 < * < 1 is used to compare two prefixes in the ternary representation of prefixes. 成功大學資訊 程系 CIAL 實驗室 13
Prefix properties The most specific prefixes (MSP): The prefixes that do not cover any others. Disjoint, so can be put in an array for binary search Grouping prefixes in layers based on MSP. 6 -7 layers for IPv 4 tables 4 4 3 1 1 3 3 2 2 2 1 1 成功大學資訊 程系 5 1 1 3 2 2 1 1 CIAL 實驗室 14
Prefix Enclosure property Database (year-month) AS 6447 (2000 -4) AS 6447 (2002 -4) AS 6447 (2005 -4) number of prefixes 79, 530 124, 798 163, 535 Level-1 prefixes 73, 891(92. 9%) 114, 745 (91. 9%)150, 245 (91. 9%) Level-2 prefixes 4, 874 (6. 1%) 8, 496 (6. 8%) 11, 135 (6. 8%) Level-3 prefixes 642 (0. 8%) 1, 290 (1%) 1, 775 (1. 1%) Level-4 prefixes 104 (0. 1%) 235 (0. 2%) 329 (0. 2%) Level-5 prefixes 17 29 45 Level-6 prefixes 2 3 6 成功大學資訊 程系 CIAL 實驗室 15
Prefix Enclosure property Layer distribution 成功大學資訊 程系 CIAL 實驗室 16
Number Prefix properties Prefix length 成功大學資訊 程系 CIAL 實驗室 17
Prefix Forwarding table example P 1 P 2 P 3 P 4 Prefix 111* 1010* 10101 Next-hop H 1 H 2 H 3 H 4 P 1 is disjoint from the other three prefixes. P 2 P 3 P 4 Longest prefix match(LPM), not exact match enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult 成功大學資訊 程系 CIAL 實驗室 18
Example Forwarding Table P 1 P 2 P 3 P 4 Prefix 111* 1010* 10101 Next-hop H 1 H 2 H 3 H 4 Longest prefix match(LPM), not exact match Prefix enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult. So, trie based schemes emerge naturally 成功大學資訊 程系 CIAL 實驗室 19
Prefix Length format: bn-1…b 0/l (l is prefix length) In IPv 4, d 3. d 2. d 1. d 0/l , 140. 116. 82. 36/24. Mask format: bn-1…b 0/mn-1…m 0 (prefix length is l) mj = 1 for all n – 1 j n – l, and mj =0 otherwise. d 3. d 2. d 1. d 0/ m 3. m 2. m 1. m 0, 140. 116. 82. 36/1. . . 10000 Ternary format: bn-1…bn-l+1*…* (prefix length is l) bj = 0 or 1 for n – 1 j n – l. If tk is *, then tj must also be * for all j < k. A single don’t care bit can be used to denote a series of don’t care bits, e. g. , 1* denotes 1**** in the 5 -bit address space. 140. 0/8 = 10001100* 成功大學資訊 程系 CIAL 實驗室 20
Prefix (n+1)-bit format: bn-1…bn-l 10… 0 (l is prefix len) for the prefix bn-1…bn-l* of length l in ternary format, there is one trailing ‘ 1’ followed by n – l 0’s. or (n+1)-bit format: bn-1…bn-l 01… 1 for the prefix bn-1…bn-l* of length l in ternary format, there is one trailing ‘ 0’ followed by n – l 1’s. 成功大學資訊 程系 CIAL 實驗室 21
5 -bit Prefixes: bn-1…bn-l 10… 0 ***** 00*** 0 0 0 * * 0 0 0 0 1 0 0 0 0 1 1 11*** 0 0 0 1 0 * 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 * 1 1 1 0 0 0 6 -bit binary address space 000000 is not used 成功大學資訊 程系 1 1 1 0 0 0 1 1 1 * * 1 1 1 1 0 1 1111111 0001111 0110011 1010101 CIAL 實驗室 22
5 -bit Prefixes: bn-1…bn-l 01… 1 ***** 00*** 0 0 0 * * 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 11*** 0 0 0 1 0 * 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 * 1 1 1 0 0 0 1 6 -bit binary address space 111111 is not used 成功大學資訊 程系 11 11 01 10 10 10 1 1 1 * * 1 1 1 1 0 1 111111 000111 011001 101010 CIAL 實驗室 23
Binary Trie (Radix Trie) Lookup 10111 P 1 111* H 1 P 2 10* H 2 P 3 1010* H 3 P 4 10101 H 4 A 1 C P 2 G P 3 next-hop-ptr (if prefix) right-ptr left-ptr B 1 0 1 D Add P 5=1110* 1 E 0 1 0 P 4 成功大學資訊 程系 Trie node H P 5 P 1 F I CIAL 實驗室 24
Binary Trie: Leaf Pushing P 1 111* H 1 P 2 10* H 2 P 3 1010* H 3 P 4 10101 H 4 P 5 1110 H 5 P 2 Disjoint, but duplication P 3 P 5 P 1 P 4 成功大學資訊 程系 CIAL 實驗室 25
Binomial spanning tree 1111 1110 1100 2 1 0 3 0000 3 1000 2 1100 1 1110 0 1111 A 4 -cube and its corresponding binomial spanning tree. 成功大學資訊 程系 CIAL 實驗室 26
Perfect code: Hamming code (7, 4) 7 -cube example: 0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001 = 7 -cube 24(16) one-level binomial spanning trees 成功大學資訊 程系 CIAL 實驗室 27
Perfect code: Hamming code (7, 4) 1000110 H 7 = 1 0 1 0 G 7 = 0 1 0 1 0010011 0111001 0001111 (a) Parity-check and generator matrices of Hamming code (7, 4). 1101100 Inner product Syndrome Error. Pattern 0000 -000 001 0000 -001 010 0000 -010 r = received code 011 0010 -000 Syndrome s = (s 2 s 1 s 0) = r.H 7 T 100 0000 -100 Corrected code = r + Error. Pattern[s] 101 0100 -000 110 1000 -000 111 0001 -000 (c) Decoding table Transpose 成功大學資訊 程系 CIAL 實驗室 28
Perfect code: Hamming code (7, 4) Generate 16 Codewords: u.G 7 u 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Codeword 0000 -000 0001 -111 0010 -011 0011 -100 0100 -101 0101 -010 0110 -110 0111 -001 1000 -110 1001 -001 1010 -101 1011 -010 1100 -011 1101 -100 1110 -000 1111 -111 7 -bit address space (7 -cube) 成功大學資訊 程系 CIAL 實驗室 29
Perfect code: Golay code (23, 12) 212 3 -level binomial spanning trees C(23, 0)+C(23, 1)+C(23, 2)+C(23, 3) = 1 + 23*22/2 +3*22*21/(3*2) = 24 + 23*11*7 = 24 + 253*8 = 24 + 2024 = 2048 = 211 成功大學資訊 程系 CIAL 實驗室 30
Ranges Why ranges? Prefixes can also be represented by ranges. The source/destination port fields of rule tables for packet classification are ranges. Prefixes are special cases of ranges. Prefix bn-1…bn-l* of length l is the range of addresses from bn-1…bn-l 0… 0 to bn-1…bn-l 1… 0, denoted as [bn-1…bn-l 0… 0, bn-1…bn-l 1… 0]. Overlapping: Two ranges are overlapping if they are not disjoint. Partially overlapping: Two ranges are partially overlapping if they are neither disjoint nor enclosing. 成功大學資訊 程系 CIAL 實驗室 31
Elementary Intervals for Ranges Definition: Let the set of k elementary intervals constructed from a set R of ranges in the address space of 0 … N – 1 be X = {Xi | Xi = [ei, fi], for i = 1 to k}. X must satisfy the following: 1) e 1 = 0 and fk = N – 1, 2) fi = ei+1 – 1 for i = 1 to k – 1, 3) all addresses in Xi are covered by the same subset of R (called the range matching set of Xi) denoted by EIi, and 4) EIi+1, for i = 1 to k – 1. 成功大學資訊 程系 CIAL 實驗室 32
Elementary Intervals for Ranges Graphical view EI 1 EI 2 EI 3 P 1 [0 , 15] {P 1} {P 1, P 3} {P 1} P 2 [16, 31] X 1 X 2 X 3 P 3 [4 , 7] [0, 3] [4, 7] [8, 15] P 4 [32, 63] P 1 P 3 P 5 [22, 23] EI 7 EI 8 EI 9 P 6 [48, 63] {P 4, P 9} {P 4, P 6, P 7 [48, 51] X 7 X 8 } [32, 39] [40, 47] X 9 P 8 [55, 55] [48, 51] P 9 [32, 39] P 4 P 9 EI 4 EI 5 EI 6 {P 2} {P 2, P 5} {P 2} X 4 X 5 X 6 [16, 21] [22, 23] [24, 31] P 5 EI 10 EI 11 EI 12 {P 4, P 6} {P 4, P 6, P 8 {P 4, P 6} X 10 } X 12 [52, 54] X 11 [56, 63] [55, 55] P 6 P 7 成功大學資訊 程系 P 2 P 8 CIAL 實驗室 33
Elementary Intervals for Ranges ID Prefix Range P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 000000/2 010000/2 000100/4 100000/1 010110/5 110000/2 110000/4 110111/6 100000/3 [0, 15] [16, 31] [4, 7] [32, 63] [22, 23] [48, 63] [48, 51] [55, 55] [32, 39] 成功大學資訊 程系 Minus-1 start finish 15 15 31 3 7 31 21 23 47 47 51 54 55 31 39 Traditional start finish 0 15 16 31 4 7 32 63 22 23 48 63 48 51 55 55 32 39 CIAL 實驗室 34
Segment Tree 23 w 7 y 47 P 1 u 3 15 v 31 z P 4 P 6 g q 54 15 P 3 X 1 X 2 [0, 3] [4, 7] P 1 X 3 [8, 15] leaf node P 2 h 21 P 2 P 4 r 39 X 6 [24, 31] P 5 X 4 X 5 [16, 21][22, 23] 成功大學資訊 程系 P 9 51 P 7 s 55 t P 8 X 7 X 8 X 9 X 10 X 11 X 12 [32, 39] [40, 47] [48, 51] [52, 54] [55, 55] [56, 63] CIAL 實驗室 35
Fat Interval Tree P 4 P 9 32 P 1 8 P 3 5 P 6 P 7 48 P 2 P 5 22 成功大學資訊 程系 P 8 55 CIAL 實驗室 36
Interval Tree Each node store a key covered by at least one range. fat interval tree each node store 1+ range. The number of nodes in the interval tree is O(N). To insert R = [e, f], if R covers root’s key, R is stored in the root. Otherwise, R is inserted in the left (right) subtree of the root when f < (e >) than root’ key. When R does not cover the key of any node which is traversed, a new node with the key selected from addresses e to f is created and inserted as the left or right child of the node which was last visited. O(log. N + k) time, k is # of prefixes that match the given address. insertion and deletion are very expensive because ranges in some nodes may need relocations after rotations. 成功大學資訊 程系 CIAL 實驗室 37
Interval Tree thin interval tree: each node stores exactly 1 range. Since ranges may overlap, two comparison rules are used to compare if a range is smaller or larger than another. For R 1 = [e 1, f 1] and R 2 = [e 2, f 2], R 1 < R 2 if e 1 < e 2. If tie, the second rule applies. R 1 < R 2 if R 2 is a subrange of R 1 (i. e. e 1 = e 2 and f 2 < f 1). Also, a node stores a max value, Max(the finish endpoints of all ranges) stored in the subtree rooted at that node. In contrast with the fat interval tree, prefix insertion and deletion take O(log. N) time. However, O(min{N, klog. N}) time is needed to find the longest matching prefix as well as the highest-priority matching prefix, where k is the number of matched prefixes for a given address. 成功大學資訊 程系 CIAL 實驗室 38
Hash Table Narrowing down the search space. Index = Hash_function(key)%m, where key may be the first k bits of IP addresses and m is the size of the hash table. Perfect hash: no collision Minimal perfect hash: A perfect hash, where the size of its hash table is k for k different hashing keys. 成功大學資訊 程系 CIAL 實驗室 39
Hash Table Difficulties: prefixes and ranges can not be used as the keys of the hash functions directly. Array of m elements H(k 1)%m k 1 k 2 H(k 2)%m collision 成功大學資訊 程系 CIAL 實驗室 40
Hash Table: 8 -bit Segmentation table A 8 -bit segmentation table is usually used for IPv 4 forwarding tables because there is no prefix of length shorter than 8. Array of 256 elements 0 H(prefix)%256 (MSB 8 bits of prefix) Prefix: 0. x. y. z 1 Prefixes with the same first 8 MSB bits Maybe empty set 255 成功大學資訊 程系 CIAL 實驗室 41
Hash Table: 16 -bit Segmentation table Prefixes of length <= 16 must be stored properly. For example, duplicate 0. 0. b. c/15 into buckets 0 and 1 or store the port of 0. 0. b. c/15 into elements 0 and 1. Put them into another set (good for update but need to search two sets in the worst case). Array of 216 elements 0 H(prefix)%216 (MSB 16 bits of prefix) Prefix: 0. 0. y. z 1 Prefixes with the same first 16 MSB bits Maybe empty set Prefixes of length 16 成功大學資訊 程系 216 -1 CIAL 實驗室 42
Hash Table: Compression Since there are many empty elements in the segmentation table, we can use bitmap to compress the segmentation table. 216 -Bitmap containing M 1’s Array of M elements 0 1 1 0 0. . . 0 1 1 0 0 1 1 Prefix: 0. 0. y. z Prefix: 0. 1. y. z Prefixes with the same first 16 MSB bits Must be non-empty M-1 成功大學資訊 程系 CIAL 實驗室 43
Metrics for Lookup Algorithms High Speed (ex. 40 Gbps/40 -byte=128 m packets/sec) Small storage (ex. Cache or On-Chip memory) Low update time Ability to handle large routing tables Flexibility in implementation Low preprocessing time IPv 6 成功大學資訊 程系 CIAL 實驗室 44
Survey: IP Lookups M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, “Survey and taxonomy of IP address lookup algorithms, ” IEEE Network, vol. 15, pp. 8– 23, March 2001. Schemes for optimizing search speed Multibit tries Two-level multibit trie, 16 -16, 24 -8 Binary range search (endpoint) Binary search on prefix length Binary prefix search Binomial Spanning tries based on Hamming and Golay perfect codes FPGA pipelined implementation (over 100 Gbps) 成功大學資訊 程系 CIAL 實驗室 45
Existing IP lookup schemes Schemes for optimizing memory requirement Small forwarding table (SFT): compressed 16 -8 -8 trie Level compressed (LC) trie Huang ‘s compressed 16 -x (C-16 -x) Compressed 8 -8 -8 -8 trie using minimal perfect hashing function Hierarchical endpiont tree (01** 0100 and 1000) Tree bitmap (compressed 4 -4 -4 -4 -4 trie) Memory optimized multibit tries with dynamic programming 成功大學資訊 程系 CIAL 實驗室 46
Existing IP lookup schemes Schemes for optimizing update speed (log N) Binary tree on binary tree scheme (PBOB), Priority search tree scheme (PST), Collection of red-black tree schemes (CRBT) Most Specific Prefix Tree (MSPT) Multigroup Most Specific Prefix Tree (MG-MSPT) Dynamic segment tree (DST), extending binary range search Multiway range tree (MRT) Prefix in B-Tree (PIBT) Dynamic Multiway Segment Tree (DMST) 成功大學資訊 程系 CIAL 實驗室 47
Existing IP lookup schemes Schemes for optimizing IPv 6 Not really Some dual stack (IPv 4/IPv 6) papers 128 -bit IPv 6 addresses 32 -bit vs. 64 -bit CPUs or Memory bandwidth Initial results: binary search based schemes are better than trie based schemes 成功大學資訊 程系 CIAL 實驗室 48
Existing Binary Range Search Traditional Endpoint: [e, f] e and f. 成功大學資訊 程系 CIAL 實驗室 49
Proposed Binary Range Search Proposed Endpoint: minus-1 -endpoint range [e, f] e – 1 and f. 成功大學資訊 程系 CIAL 實驗室 50
Binary prefix search Definition 1 (Prefix comparison): The inequality 0 < * < 1 is used to compare two prefixes in the ternary format. 成功大學資訊 程系 CIAL 實驗室 51
Binary prefix search Directly performing a binary search on the list of sorted prefixes may encounter a failure: Dst = 01011000 2 4 3 Correct match 成功大學資訊 程系 1 Failed match CIAL 實驗室 52
Binary prefix search Enclosure relationship between prefixes results in the search failure Generate some auxiliary prefixes that inherit the routing information of the original LPM (e. g. , F) and put them where the binary search operations can find them. ex. auxiliary prefix 01011000. Therefore, it is feasible to split prefix F into two parts such that both sides of prefix O are covered. 成功大學資訊 程系 CIAL 實驗室 53
Binary prefix search The full tree expansion splits the enclosure prefixes into many longer prefixes (leaf pushing). Auxiliary prefix merges Many auxiliary prefixes may inherit the same routing information of a common enclosure prefix. These prefixes can be merged into one. The merge operation is defined as follows. Prefix merge: The prefix obtained by merging a set of consecutive prefixes is the longest common ancestor (LCA) of these consecutive prefixes in the binary trie. 成功大學資訊 程系 CIAL 實驗室 54
Binary prefix search The full tree expansion F 3=01011000 成功大學資訊 程系 CIAL 實驗室 55
Binary prefix search The full tree after the merge operations F 3=01011000 成功大學資訊 程系 CIAL 實驗室 56
(n+1)-bit prefix representation For efficient comparisons between prefixes (n+1)-bit representation for n-bit prefix. bn– 1…bn–i*…* bn– 1…bn–i 10… 0 with n – i trailing zeros. n = 32 for IPv 4 or n = 64 for IPv 6 Examples: 5 -bit prefixes 110** 110100 100001 Converted n-bit representation 成功大學資訊 程系 CIAL 實驗室 57
Performance: Search Speed 成功大學資訊 程系 CIAL 實驗室 58
Performance: Search Speed 成功大學資訊 程系 CIAL 實驗室 59
Scheme Segmentation Statistics Ref. Memory Original table Binary trie BSD trie Compressed 16 -x N/A # of prefixes: 120, 635 (45 bits each) # of nodes: 320, 478 (7 bytes each) # of nodes: 222, 334 (8 bytes each) Base array: 427 KB Compressed Bit-map: 427 KB CNHA: 37. 9 KB 662. 7 KB 8/32 2, 447 KB 8/26 1, 993 KB # of prefixes: 232, 887 (6 bytes each) # of prefixes: 217, 146 (4 bytes each) # of blocks: 71, 034 (64 bytes each) # of prefixes: 145, 737 (5 bytes each) # of prefixes: 117, 968 (3 bytes each) # of blocks: 11, 175 (64 bytes each) 1/6 1/4 1/5 1/4 1/3 LC trie SFT 16 -bit segmentation 65535 entries (4 byte each) N/A 1/3 1, 147 KB # of nodes: 259, 371 (4 bytes each) 1/5 2, 859 KB Branch factor: 16 Base vector: 110, 679 (16 bytes each) Fill factor: 0. 5 Prefix vector: 9, 927 (12 bytes each) Next Hop vector: 255 (4 bytes each) # of segments (avg # of prefixes per segment) level-1 pointers: Sparse: 2, 765 (2. 9) 1/12 649. 9 KB Dense: 4, 300 (25. 5) 13, 317 Very dense: 586 (91. 7) level-2 pointers: Maptable: 5. 3 K 461 Base array: 2 K (4 bytes each) Code Word array: 8 K Binary range No segmentation Binary range 16 -bit segmentation Multiway range 16 -bit segmentation Binary prefix No segmentation Binary prefix 16 -bit segmentation Multiway prefix 16 -bit segmentation 成功大學資訊 程系 CIAL table 實驗室 containing 120, 635 Table 3: Memory required for the routing prefixes, where multiway assumes a cache block of size 64 bytes. 1, 365 KB 1, 104 KB 4, 695 KB 646 KB 601 KB 954. 4 KB 60
Goal of hardware approaches Pipeline architecture to achieve throughput of over 100 Gbps The most popular design is based on multibit trie Advantages Simple Bubble instruction to perform updates Disadvantages Unbalanced memory among stages (solvable) Memory requirement is too large to fit in the on-chip memory of hardware device such as FPGA 成功大學資訊 程系 CIAL 實驗室 61
Multibit trie pipeline Pipeline registers 62 成功大學資訊 程系 CIAL 實驗室 62
Ring Pipeline Packet in S 1 S 2 Odd cycle 成功大學資訊 程系 S 3 S 4 Packet out Even cycle CIAL 實驗室 63
References for existing pipelines Ring - ISCA-2005 -A Tree Based Router Search Engine Architecture with Single Port Memories SDP - SIGCOMM-2005 -Dynamic Pipelining Making IPLookup CAMP - ANCS-2006 -CAMP:Fast and Efficient IP Lookup Architecture OLP - HOTI-2007 -A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Bi. OLP - FCCM-2008 -A SRAM-based Architecture for Trie-based IP Lookup Using FPGA Du. PI - FPL-2008 -Scalable high-throughput SRAMbased architecture for IP-lookup using FPGA 成功大學資訊 程系 CIAL 實驗室 64
References for existing pipelines Hash - Hot. I-2008 -An Efficient Hardware-based Multihash Scheme for High Speed IP Lookup POLP - INFOCOM-2008 -Beyond TCAM-An SRAM based parallel multi-pipeline arch for terabit IP lookup POLP - IPDPS-2008 -Parallel IP Lookup Using Multiple SRAM-based Pipelines Flash. Look - HPSR-2009 -Flash. Look 100 Gbps Hash. Tuned Route Lookup Architecture FPL - HPSR-2009 -Frugal IP Lookup Based on a Parallel Search Flash. Trie - INFOCOM-2010 - Hash-based Prefix. Compressed Trie for IP Route Lookup Beyond 100 Gbps 成功大學資訊 程系 CIAL 實驗室 65
References for existing pipelines p. DST - FPGA-2010 -High Throughput and Large Capacity Pipelined Dynamic Search Tree on FPGA DMST - TC-2010 -Dynamic Multiway Segment Tree for IP Lookups and the Fast Pipelined Search Engine BPFL & POLP - HPSR-2011 -FPGA implementation of lookup algorithms Shuffled Trie - ICC-2011 -Bit-Shuffled Trie IP Lookup with Multi-Level Index Tables Prefix Partitioning - TC-2011 -Scalable Tree-based Architectures for IPv 4 v 6 Lookup Using Prefix Partitioning 成功大學資訊 程系 CIAL 實驗室 66
Techniques used Balance Data Dual Memory Pipeline Parallel stage Hashing Cache Update Structure Port used memory Multibit trie Linear Segment Circular Tree Prefix Bi. Tree directional Yes/ No Yes/ On-chip No Yes/ No Off-chip Yes/ No Easy with bubble difficult Both 成功大學資訊 程系 CIAL 實驗室 67
THE PROPOSED SCHEME P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 Prefix 0* 0101* 1001* 10111 11* 0001* 00111 001* 0011* P 1 P 6 P 8 P 10 P 7 P 3 P 2 P 11 P 9 P 4 P 5 Level-1 68 成功大學資訊 程系 CIAL 實驗室 68
THE PROPOSED SCHEME P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 Prefix 0* 0101* 1001* 10111 11* 0001* 00111 001* 0011* P 1 P 8 P 10 P 3 P 11 Level-2 69 成功大學資訊 程系 CIAL 實驗室 69
THE PROPOSED SCHEME P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 Prefix 0* 0101* 1001* 10111 11* 0001* 00111 001* 0011* P 10 Level-3 70 成功大學資訊 程系 CIAL 實驗室 70
THE PROPOSED SCHEME P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 Prefix 0* 0101* 1001* 10111 11* 0001* 00111 001* 0011* P 1 Level-4 71 成功大學資訊 程系 CIAL 實驗室 71
- SEARCHING search 01000 72 成功大學資訊 程系 CIAL 實驗室 72
- SEARCHING search 01000 73 成功大學資訊 程系 CIAL 實驗室 73
- SEARCHING search 01000 74 成功大學資訊 程系 CIAL 實驗室 74
- SEARCHING search 01000 75 成功大學資訊 程系 CIAL 實驗室 75
- SEARCHING search 01000 76 成功大學資訊 程系 CIAL 實驗室 76
- INSERTION (1/2) insert 0010* (P 12) 77 成功大學資訊 程系 CIAL 實驗室 77
- INSERTION (1/2) insert 0010* (P 12) 78 成功大學資訊 程系 CIAL 實驗室 78
- INSERTION (1/2) rotation for balance after inserting a node 79 成功大學資訊 程系 CIAL 實驗室 79
- INSERTION (1/2) rotation for balance after inserting a node 80 成功大學資訊 程系 CIAL 實驗室 80
- INSERTION (2/2) insert 0011* (P 11) 81 成功大學資訊 程系 CIAL 實驗室 81
- INSERTION (2/2) insert 0011* (P 11) 82 成功大學資訊 程系 CIAL 實驗室 82
- DELETION delete 1001* enclosure in next level of 1001* min of right subtree delete this node… max of left subtree 83 成功大學資訊 程系 CIAL 實驗室 83
B-Tree structure 成功大學資訊 程系 CIAL 實驗室 84
B-tree structure layer 0 12 P 5 2 P 7 5 7 P 10 P 11 22 P 8 25 P 12 88 P 3 30 P 9 layer 1 4 P 4 20 P 6 72 P 1 layer 2 24 P 2 成功大學資訊 程系 CIAL 實驗室 85
B-Tree structure 成功大學資訊 程系 CIAL 實驗室 86
B-Tree structure 成功大學資訊 程系 CIAL 實驗室 87
Stage 4 The most complicated part of the proposed LSE is in stage 4 which contains two components IP/Keys Matching unit Branch Detection unit. These two components are processed in parallel. 成功大學資訊 程系 CIAL 實驗室 91
IP/Key Matching unit ip and keyx are matched 成功大學資訊 程系 CIAL 實驗室 92
Branch Detection Unit 成功大學資訊 程系 CIAL 實驗室 93
IP/Keys Matching unit Computes a flag called NULLx for keyx. If keyx is NULL ( ), NULLx is 1. Otherwise, NULLx is 0. • mask of keyx in (n+1)-bit prefix format • If keyx = , = • is an n-bit vector for x = 0 to m – 2 (i. e. , there are m – 1 keys, to , stored in a node). 成功大學資訊 程系 CIAL 實驗室 94
IP/Keys Matching unit • IP address (n-bit vector ) is matched against mask to get the matching result matchx. • Conventional match operation between IP and prefix (always true if mask is ). • Thus, to force the matching result matchx between an IP address and the mask of a NULL key to be always false, the left-hand side of the ‘and’ operation in equation (3) is needed. 成功大學資訊 程系 CIAL 實驗室 95
IP/Keys Matching unit • Compute the next-hop number corresponding to the matched key • Set output signal node. match to one if at least one of all m – 1 possible keys matches input IP. 成功大學資訊 程系 CIAL 實驗室 96
Branch Detection unit. • equations (6)-(8) determine if keyx > • Store the result in Gx. All Gx's for x = 0 to m – 2 form an (m– 1)-bit vector. 成功大學資訊 程系 CIAL 實驗室 97
Branch Detection unit. • convert to an m-bit vector such that only one bit in it is set and all other bits are unset. • Compute the branch, i. e. , the bit position of the only set bit in and stores it in the log 2 m -bit vector 成功大學資訊 程系 CIAL 實驗室 98
Optimization techniques In order to reduce required memory and improve search speed, four optimization techniques are used simple one-level prefix push scheme routing table split scheme B-tree order varying scheme stage 4 clock rate improving scheme. 成功大學資訊 程系 CIAL 實驗室 99
The simple one-level prefix push reduce the number of layers, i. e. hardware cost Similar to leaf-push in the binary trie, only pushes the prefixes down one level if at least one of their child prefixes is valid prefix. The main advantage: the layer number assigned to some prefixes will be changed from high numbers to lower ones. 成功大學資訊 程系 CIAL 實驗室 100
The routing table split scheme reduce the number of bits needed for prefixes of lengths 9 to 24 stored in B-tree nodes. Routing table is split into two groups: large groups (the prefixes of lengths 9 to 24) small groups (lengths 25 to 32) The large group accounts for at least 98% of the prefixes in the routing table. For the large group, only 17 -bit keys are needed, instead of 25 bits. The memory consumption for prefixes in the small group remains the same. 成功大學資訊 程系 CIAL 實驗室 101
B-tree order varying scheme uses as small a B-tree order as possible for the B-tree nodes in each layer. B-tree order of layer 0 is 22 for containing a large number of prefixes in the 3 -level Btree for AS 6447 table. Since other layers contain much less prefixes than layer 0, we can shrink the required memory by reducing the B-tree order of these layers as long as the number of levels in the B-trees is not greater than 3. 成功大學資訊 程系 CIAL 實驗室 102
Stage 4 clock rate improving scheme increase the clock rate of the bottleneck stage of the proposed pipelines. Break stage 4 (i. e. , equations 1 to 5) into three sub-stages computing equations 1 -2, 3, and 4 -5, respectively. Since Branch Detection unit runs in parallel with the IP/Keys Matching unit, it is also split into three sub-stages computing equations 6 -8, 9, and 10 respectively. 成功大學資訊 程系 CIAL 實驗室 103
Prefix Enclosure Analysis (1/2) Routing Table AS 6447 (2005 -4) AS 6447 (2009 -7) AS 6447 (2011 -5) size 163, 535 301, 552 369, 394 0 150, 245 (91. 9%) 273, 944 (90. 8%) 334, 445 (90. 5%) Layer 1 11, 135 (6. 8%) 23, 171 (7. 7%) 28, 930 (7. 8%) 2 1, 775 (1. 1%) 3, 690 (1. 2%) 4, 870 (1. 3%) 3 329 (0. 2%) 628 (0. 2%) 926 (0. 3%) 4 45 101 177 5 6 16 30 6 0 2 12 7 0 0 4 成功大學資訊 程系 CIAL 實驗室 106
Prefix Enclosure Analysis (2/2) After split table and small push Routing Table size length AS 6447 (2005 -4) 163, 535 AS 6447 (2009 -7) 301, 552 AS 6447 (2011 -5) 369, 394 9 -24 25 -32 0 7, 071 3, 488 271, 548 5, 736 333, 034 5579 1 496 67 17, 081 28 20, 884 21 Layer 2 54 3 6 4 0 1, 488 0 88 2 成功大學資訊 程系 1, 859 0 111 0 5 CIAL 實驗室 107
Performance (IPv 4) Scheme Clock # of accessed Throughput Configurabl Memory # of ME ratio Mpps Gbps rate nodes / search FPGA Device e logic (slice) (Mbits) prefixes bits/prefix (ns) avg worst LSE Virtex-6 XC 6 VSX 315 T Parallel LSE DMST [7] Virtex-5 Original VLX 330 T Extended 1, 402 (2. 8%) 7, 408 (15%) 10. 9 37. 0 3. 16 3. 02 13 104. 9 24. 4 33. 6 7. 8 36. 8 2. 65 74. 9 116. 4 129. 6 115. 2 114. 0 5. 00 301, 552 10. 8 463 (1%) 2, 778 (6%) 408 (1%) 528 (2%) 362 (1%) 1, 785 (3%) 27. 9 K LE (b) (19%) 6. 6 K LE (b) (5%) 12. 8 k LUTs(6. 3%) 11. 68 163, 574 1 4. 68 377. 5 6 1. 0 2. 0 1. 3 1. 0 42. 7 33. 3 200 125 200 250 325 Ring [1] 3. 49 Virtex-2 P CAMP [11] 3. 89 30, 000 4. 00 XC 2 VP 70 OLP [10] 3. 46 Bi. OLP(a)[12] Virtex-4 FX 140 9. 54 83, 662 3. 069 BPFL(a) 5. 6 18. 1 10. 35 1. 0 96. 6 [23] Stratix II 309, 000 POLP(a) EP 2 S 180 F 1020 C 5 9. 08 29. 4 9. 29 1. 0 107. 6 [23] p. DST(d) Virtex-5 LX 330 7. 112 96, 000 75. 86 7. 4 1. 0 242 [44] BSTrie(a) Virtex-5 749 (2%) 7. 38 321, 000 23. 0 5. 7 1. 0 175 [22] XC 5 VSX 240 T Notes: (a) the next hops are not stored in on-chip memory. (b) Instead of slices, Stratix LEs are used. (c) It is the non-cache based Bi. OLP. (d) dual-ported block RAM is used 成功大學資訊 程系 120. 8 21. 9 17. 1 102 40 64 80 104 30. 9 34. 4 77. 4 56 CIAL 實驗室 108
Performance (IPv 6) Scheme Clock # of accessed Throughput Configurabl Memory # of ME ratio rate FPGA Device nodes / search Mpps Gbps e logic (slice) (Mbits) prefixes bits/prefix (ns) avg worst LSE Parallel LSE Virtex-6 XC 6 VSX 315 T 969 (1%) 0. 94 136. 0 3. 06 3. 03 9 7, 049 2, 468 1 1. 58 230 2. 42 (5%) 104. 9 36. 3 34. 5 11. 6 413. 9 132. 5 Three groups: Prefixes of lengths 12 -31, 32 -64, 65 -128 12 -bit segmentation table 成功大學資訊 程系 CIAL 實驗室 109
Conclusions Layered Tree for dynamic routing table On-chip memory Parallel and Pipeline architecture Achieve throughput of 120 Gbps 成功大學資訊 程系 CIAL 實驗室 111
- Slides: 111