# MemoryEfficient and Scalable Virtual Routers Using FPGA Author

Memory-Efficient and Scalable Virtual Routers Using FPGA Author: Hoang Le, Thilan Ganegedara and Viktor K. Prasanna Publisher: ACM/SIGDA FPGA '11 Presenter: Zi-Yang Ou Date: 2011/10/12 1

About FPGA’ 11 2

Introduction n n Network virtualization is a technique to consolidate multiple networking devices onto a single hardware platform. To make efficient use of the networking resources. An abstraction of the network functionality away from the underlying physical network. Separated and Merged 3

Contributions n n A simple merging algorithm results in the total amount of required memory to be less sensitive to the number of routing tables, but to the total number of virtual prefixes. A tree-based architecture for IP lookup in virtual router that achieves high throughput and supports quick update. Use of external SRAMs to support large virtual routing table of up to 16 M virtual prefixes. A scalable design with linear storage complexity and resource requirements. 4

Set-Bounded Leaf-Pushing Algorithm n n n In order to use tree search algorithms, the given set of prefixes needs to be processed to eliminate the overlap between prefixes. This elimination process results in a set of disjoint prefixes. 1. Build a trie from the given routing table. 2. Grows the trie to a full tree. 3. Pushes all the prefixes to the leaf nodes. The prefix tables expand about 1. 6 times after leaf pushing. About 90% of the prefixes are leaf prefixes. 5

Set-Bounded Leaf-Pushing Algorithm There are 3 steps involved in the algorithm: 1. Move the leaves of the trie into Set 1 2. Trim the leaf-removed trie 3. Leaf-push the resulting trie and move the leaves into Set 2 n n n NS 1 = l. N NS 2 = k. N(1 − l) N‘ = NS 1 + NS 2 = k. N + l. N(1 − k) = N(k + l − kl) l = 0. 9 k = 1. 6 N‘ = 1. 06 N 6

2 -3 Tree 7

Merging Algorithm 8

Merging Algorithm 9

Merging Algorithm 10

Merging Algorithm 11

Merging Algorithm 12

Merging Algorithm 13

IP Lookup Algorithm for Virtual Router 14

IP Lookup Algorithm for Virtual Router 15

Memory Requirement 16

Memory Requirement n n M 1 = |S 1|(L + LP + LV ID + LNHI + 2 LPtr 1) M 2 = |S 2|(L + LP + LV ID + LNHI + 2 LPtr 2) M = M 1 +M 2 = (|S 1| + |S 2|)(L + LP + LV ID + LNHI) + 2|S 1|LPtr 1 + 2|S 2|LPtr 2 M = (|S 1| + |S 2|)(L + LP + logm + LNHI) + |S 1|LPtr 1 + |S 2|LPtr 2 |S 1| + |S 2| = 1. 06 N , O(N × logm). 17

Overall Architecture 18

Overall Architecture 19

Overall Architecture 20

Memory Management 21

Virtual Routing Table Update 22

Scalability n The per-level required memory size grows exponentially as we go from one level to the next of the tree. Therefore, we can move the last few stages onto external SRAMs. 23

IMPLEMENTATION n n n M = (|S 1| + |S 2|)(L + LP + LVID + LNHI) + |S 1|LPtr 1 + |S 2|LPtr 2 M = NS(L + LP + LVID + LNHI + LPtr) MIPv 4 = NS(32 + 5 + 6 + 20) = 68 NS MIPv 6 = NS(128 + 7 + 5 + 6 + 20) = 164 NS A state-of-the-art FPGA device with 36 Mb of on-chip memory (e. g. Xilinx Virtex 6) can support up to 530 K prefixes (for IPv 4), or up to 220 K prefixes (for IPv 6), without using external SRAM. 24

IMPLEMENTATION n n In our design, external SRAMs can be used to handle even larger routing tables, by moving the last stages of the pipelines onto external SRAMs. Thus, the architecture can support up to 16 M prefixes, or 880 K prefixes for IPv 4 and IPv 6, respectively. 25

Experimental Setup 26

Experimental Setup 27

Performance Comparison n n Candidates : trie-overlapping (A 1), trie braiding (A 2) Time complexity: Our algorithm: O(N) A 1 : O(Nlog. N) A 2 : O(N^2) Memory efficiency: Our algorithm: 15 MB for 1. 3 M prefixes A 1 : 9 MB for 290 K prefixes A 2 : 4. 5 MB for 290 K prefixes Quick-update capability: A 1 and A 2 : reconstruct the entire lookup data structure 28

- Slides: 28