CS 838 Net FPGA Tutorial Theophilus Benson Outline
CS 838: Net. FPGA Tutorial Theophilus Benson
Outline • Background: What is the Net. FPGA? • Life cycle of a packet through a Net. FPGA • Demo
What is the Net. FPGA? Networking Software running on a standard PC CPU Memory PCI A hardware accelerator built with Field Programmable Gate Array driving Gigabit network links 1 GE FPGA 1 GE Memory 1 GE
Net. FPGA Router Function – 4 Gigabit Ethernet ports Fully programmable – FPGA hardware Open-source FPGA hardware -– Verilog base design Open-source Software -- Linux user Level – Drivers in C and C++
Net. FPGA Platform Major Components – Interfaces • 4 Gigabit Ethernet Ports • PCI Host Interface – Memories • 36 Mbits Static RAM • 512 Mbits DDR 2 Dynamic RAM – FPGA Resources • Block RAMs • Configurable Logic Block (CLBs) • Memory Mapped Registers
Net. FGPA: Router Design • Pipeline of modules – FIFO queues between each module • Inter module communication – CTRL: Send on ctrl bus (8 bits) • Metadata about the data being send – DATA: Send on data bus (64 bits) – RDY: Signifies ready to receive packet (1 bit) – WR: Signifies packet being send(1 bit)
Net. FPGA FGPA Modules 2 Hardware FGPA Modules 1 Software Linux Processes Linux user-level processes Verilog on Net. FPGA PCI board
Example: An IP Router on Net. FPGA Exception Processing Switching Hardware Forwarding Table Routing Protocols Routing Table Software Management & CLI Linux user-level processes Verilog on Net. FPGA PCI board
Life of a Packet through the hardware 192. 168. 10 1. x port 0 port 2 192. 168. 10 2. y
Router Stages MA C Rx. Q CP U Rx. Q Input Arbiter Output Port Lookup Output Queues MA C Tx. Q CP U Tx. Q
Inter-module Communication Using “Module Headers”: Ctrl Word (8 bits) Data Word (64 bits) x Module Hdr … … y Last Module Hdr 0 Eth Hdr 0 IP Hdr 0 0 x 10 … Last word of packet Contain information such as packet length, input port, output port, …
Inter-module Communication data Module i ctrl Module i+1 wr rdy
MAC Rx Queue
Rx Queue 0 xff 0 0 0 Pkt length, input port = 0 Eth Hdr: Dst MAC = port 0, Ethertype = IP Rx. IP Queue Hdr: IP Dst: 192. 168. 2. 3, TTL: 64, Csum: 0 x 3 ab 4 Data
Rx Q 7 Input Arbiter Pkt … Input Arbiter Rx Q 1 Pkt Rx Q 0 Pkt
Output Port Lookup
1 - Check input port matches Dst MAC 2 - Check TTL, checksum 3 - Lookup next hop IP & output port (LPM) 4 - Lookup next hop MAC address (ARP) Output Port Lookup 0 x 04 0 xff 0 0 0 5 - Add output port module output port = 4 Pkt length, input port = 0 Eth. Hdr: MAC =0 Eth. Hdr: Dst MAC = next. Hop Output Port MAC = x, 4, Src MAC = port Lookup Ethertype = IP IP Hdr: IP Dst: 192. 168. 2. 3, TTL: 64, 63, Csum: 0 x 3 ab 4 Csum: 0 x 3 ac 2 Data 6 - Modify MAC Dst and Src addresses 7 -Decrement TTL and update checksum
Output Queues OQ 0 Output Queues OQ 4 OQ 7
MAC Tx Queue
MAC Tx Queue 0 x 04 0 xff 0 0 0 output port = 4 Pkt length, input port = 0 Eth. Hdr: Dst MAC = next. Hop MAC Tx Queue Src MAC = port 4, Ethertype = IP IP Hdr: IP Dst: 192. 168. 2. 3, TTL: 64, 63, Csum: 0 x 3 ab 4 Csum: 0 x 3 ac 2 Data
Net. FPGA-Host Interaction • Linux driver interfaces with hardware – Packet interface via standard Linux network stack – Register reads/writes via ioctl system call (with convenience wrapper functions) • read. Reg(nf 2 device *dev, int address, unsigned *rd_data) • write. Reg(nf 2 device *dev, int address, unsigned *wr_data) eg: read. Reg(&nf 2, OQ_NUM_PKTS_STORED_0, &val);
Net. FPGA-Host Interaction Register access PCI Bus 1. Software makes ioctl call on network socket. ioctl passed to driver. 2. Driver performs PCI memory read/write
Net. FPGA-Host Interaction • Packet transfers shown using DMA interface • Alternative: use programmed IO to transfer packets via register reads/writes – slower but eliminates the need to deal with network sockets
DEMO: Life of a Packet through the hardware 192. 168. 1. x port 0 port 2 192. 168. 2. y
• Programming the FPGA with your code – nf 2_download NF 2/bitfiles/reference_router. bit • Mirror linux arp –. /NF 2/projects/router_kit/sw/rkd • Helpful tool –. /NFlib/C/router/cli – Shows forwarding tables {arp table, ip table} – Allows to modify tables
Useful Links • • • Net. FPGA Website Net. FPGA Wiki Net. FPGA Guide Walkthrough the Reference Designs The Verilog Golden Reference Guide
Questions
Verilog
• Hardware Description Languages Concurrent – By Default, Verilog statements evaluated concurrently • Express fine grain parallelism – Allows gate-level parallelism • Provides Precise Description – Eliminates ambiguity about operation • Synthesizable – Generates hardware from description
Verilog Data Types reg [7: 0] A; // 8 -bit register, MSB to LSB // (Preferred bit order for Net. FPGA) reg [0: 15] B; // 16 -bit register, LSB to MSB B = {A[7: 0], A[0: 7]}; // Assignment of bits reg [31: 0] Mem [0: 1023]; // 1 K Word Memory integer Count; // simple signed 32 -bit integer K[1: 64]; // an array of 64 integers time Start, Stop; // Two 64 -bit time variables From: CSCI 320 Computer Architecture Handbook on Verilog HDL, by Dr. Daniel C. Hyde : http: //eesun. free. fr/DOC/VERILOG/verilog-manual. html
Signal Multiplexers Two input multiplexer (using if / else) reg y; always @* if (select) y = a; else y = b; Two input multiplexer (using ternary operator ? : ) wire t = (select ? a : b); From: http: //eesun. free. fr/DOC/VERILOG/synvlg. html
Larger Multiplexers Three input multiplexer reg s; always @* begin case (select 2) 2'b 00: s = a; 2'b 01: s = b; default: s = c; endcase end
Synchronous Storage Elements Din D Q Dout • Values change at times governed by clock Clock – Clock • Input to circuit Clock 1 Clock Transition 0 t=1 time t=2 – Clock Event • Example: Rising edge Din B t=0 – Flip/Flop • Transfers Value From Din to Dout on Clock event A Dout Clock Transition S 0 t=0 C A B
Finite State Machines
Synthesizable Verilog : Delay Flip/Flops D-type flip flop reg q; always @ (posedge clk) q <= d; D type flip flop with data enable reg q; always @ (posedge clk) if (enable) q <= d; From: http: //eesun. free. fr/DOC/VERILOG/synvlg. html
More on Net. FPGA System
Net. FPGA System CAD Tools Monitor Software Web & Video Server Browser & Video Client User Space Linux Kernel Packet Forwarding Table PCI-e VI VI NIC Net. FPGA Router Hardware GE GE GE (nf 2 c 0. . 3) (eth 1. . 2)
Net. FPGA System Implementation • Net. FPGA Blocks – Virtex-2 Pro FPGA – 4. 5 MB ZBT SRAM – 64 MB DDR 2 DRAM – PCI Host Interface – 4 Gigabit Ethernet ports • Intranet Test Ports – Dual or Quad Gigabit Etherents on PCI-e • Internet – Gigabit Ethernet on Motherboard • Processor – Dual-Core CPU • Operating System – Linux Cent. OS 4. 4
Net. FPGA Lab Setup PCI-e Client Server Dual NIC (eth 1. . 2) GE Eth 2 : Server GE Eth 1 : Local host Net-FPGA GE Nf 2 c 3 : Adj. Server Net. FPGA Control SW Internet Router Hardware GE Nf 2 c 2 : Local Host GE Nf 2 c 1 : Adjacent GE Nf 2 c 0 : Adjacent CAD Tools PCI CPU x 2
Exception Path
Exception Packet • Example: TTL = 0 or TTL = 1 • Packet has to be sent to the CPU which will generate an ICMP packet as a response • Difference starts at the Output Port lookup stage
Exception Packet Path PW-OSPF Software Driver nf 2 c 0 PCI Bus Net. FPGA Java GUI nf 2 c 1 nf 2 c 2 nf 2 c 3 DMA CP CP U U Rx. Q Tx. Q ioctl Registers CP CP U U Rx. Q Tx. Q nf 2_reg_grp user data path MA MA C C Tx. Q Rx. Q Ethernet MA MA C C Tx. Q Rx. Q
1 - Check input port matches Dst MAC Output Port Lookup 0 x 04 2 - Check TTL, checksum – EXCEPTION! 0 xff 3 - Add output port module 0 0 0 output port = 1 Pkt length, input port = 0 Eth. Hdr: Dst MAC = 0, Output Src. Port MAC = x, Lookup Ethertype = IP IP Hdr: IP Dst: 192. 168. 2. 3, TTL: 1, Csum: 0 x 3 ab 4 Data
Output Queues OQ 0 OQ 1 Output Queues OQ 2 OQ 7
CPU Tx Queue
CPU Tx Queue 0 x 04 0 xff 0 0 0 output port = 1 Pkt length, input port = 0 Eth. Hdr: Dst MAC = 0, CPU Tx Queue Src MAC = x, Ethertype = IP IP Hdr: IP Dst: 192. 168. 2. 3, TTL: 1, Csum: 0 x 3 ab 4 Data
ICMP Packet • For the ICMP packet, the packet arrives at the CPU Rx Queue from the PCI Bus • Follows the same path as a packet from the MAC until the Output Port Lookup. • The OPL module seeing the packet is from the CPU Rx Queue 1, sets the output port directly to 0. • The packet then continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0.
ICMP Packet Path PW-OSPF Software Driver nf 2 c 0 PCI Bus Net. FPGA Java GUI nf 2 c 1 nf 2 c 2 nf 2 c 3 DMA CP CP U U Rx. Q Tx. Q ioctl Registers CP CP U U Rx. Q Tx. Q nf 2_reg_grp user data path MA MA C C Tx. Q Rx. Q Ethernet MA MA C C Tx. Q Rx. Q
Net. FPGA-Host Interaction Net. FPGA to host packet transfer 1. Packet arrives – forwarding table sends to CPU queue PCI Bus 2. Interrupt notifies driver of packet arrival 3. Driver sets up and initiates DMA transfer
Net. FPGA-Host Interaction Net. FPGA to host packet transfer (cont) PCI Bus 4. Net. FPGA transfers packet via DMA 5. Interrupt signals completion of DMA 6. Driver passes packet to network stack
Net. FPGA-Host Interaction Host to Net. FPGA packet transfers 1. Software sends packet via network sockets. Packet delivered to driver. PCI Bus 2. Driver sets up and initiates DMA transfer 3. Interrupt signals completion of DMA
- Slides: 51