Extreme Networking Achieving Nonstop Network Operation Under Extreme

  • Slides: 26
Download presentation
Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions DARPA PI Meeting, January

Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions DARPA PI Meeting, January 27 -29, 2003 Jon Turner jst@cse. wustl. edu http: //www. arl. wustl. edu/arl

Project Overview n Motivation » data networks have become mission-critical resource » networks often

Project Overview n Motivation » data networks have become mission-critical resource » networks often subject to extreme traffic conditions » need to design networks for worst-case conditions » technology advances making extreme defenses practical n Extreme network services » Lightweight Flow Setup (LFS) » Network Access Service (NAS) » Reserved Tree Service (RTS) n Key router technology components » Super-Scalable Packet Scheduling (SPS) » Dynamic Queues with Auto-aggregation (DQA) » Scalable Distributed Queueing (SDQ) 2 - Jonathan Turner – January 27 -29, 2003

Prototype Extreme Router Control Processor OPP IPP OPP IPP Switch Fabric FPX FPX FPX

Prototype Extreme Router Control Processor OPP IPP OPP IPP Switch Fabric FPX FPX FPX SPC SPC SPC Line Card Line Card 3 - Jonathan Turner – January 27 -29, 2003

Prototype Extreme Router Control Processor OPP IPP OPP IPP Switch Fabric FPX FPX FPX

Prototype Extreme Router Control Processor OPP IPP OPP IPP Switch Fabric FPX FPX FPX SPC SPC SPC Line Card Line Card 4 - Jonathan Turner – January 27 -29, 2003

Prototype Extreme Router Field Programmable Port Ext. Control Processor ATM Switch Core FPX FPX

Prototype Extreme Router Field Programmable Port Ext. Control Processor ATM Switch Core FPX FPX Field Programmable SPC SPC Port Extenders Line Card 5 - Jonathan Turner – January 27 -29, 2003 Line Card FPX SDRAM FPX 128 MB SPC Reprogrammable SPC Application Device Line Card SRAM 4 MB OPP IPP OPP IPP Switch Fabric FPX Network SPC Interface Device Line Card

Prototype Extreme Router Control Processor Smart Port Card 2 Flash Disk FPX North Bridge

Prototype Extreme Router Control Processor Smart Port Card 2 Flash Disk FPX North Bridge SPC APIC 128 MB OPP IPP OPP IPP Switch Fabric FPX FPX FPX FPGA SPC SPC Line Card SPC Pentium Embedded Line Card Processors Line Card Cache 6 - Jonathan Turner – January 27 -29, 2003

Prototype Extreme Router Gigabit Ethernet Control Processor OPP IPP IPP OPP Framer OPP FPGA

Prototype Extreme Router Gigabit Ethernet Control Processor OPP IPP IPP OPP Framer OPP FPGA IPP OPP IPP Switch Fabric FPXGBIC FPX FPX SPC SPC SPC Line Card Line Card 7 - Jonathan Turner – January 27 -29, 2003

Performance of SPC-2 Largest gain at small packet sizes. PCI bus limits performance for

Performance of SPC-2 Largest gain at small packet sizes. PCI bus limits performance for large packets 8 - Jonathan Turner – January 27 -29, 2003

More SPC-2 Performance Throughput loss at high loads due to PCI bus contention and

More SPC-2 Performance Throughput loss at high loads due to PCI bus contention and input priority. 9 - Jonathan Turner – January 27 -29, 2003

Field Programmable Port Extender (FPX) n Functions for extreme router. » high speed packet

Field Programmable Port Extender (FPX) n Functions for extreme router. » high speed packet storage manager » packet classification & route lookup – fast route lookup – exact match filters – 32 general filters » flexible queue manager – per-flow queues for reserved flows – route packets to/from SPC 10 - Jonathan Turner – January 27 -29, 2003 (1 MB) SRAM 36 (1 MB) 2 Gb/s interface Reprogrammable App. Device (400 Kg+80 KB) 6. 4 Gb/s Network Interface Device 64 SDRAM (64 MB) 64 SDRAM 100 MHz 36 100 MHz » will implement core router functions in extensible router » may also implement arbitrary packet processing functions SRAM ü ý þ Network Interface Device (NID) routes cells to/from RAD. n Reprogrammable Application Device (RAD) functions: n (64 MB) ü 2 Gb/s ý interface þ

Logical Port Architecture virtual output DQ queues . . . FPX PCU Packet Classification

Logical Port Architecture virtual output DQ queues . . . FPX PCU Packet Classification special flow queues . . . SPC PCU plugins 11 - Jonathan Turner – January 27 -29, 2003 . . . Input Side Processing RC. . . reassembly contexts SPC plugins output queues. . . FPX special flow queues Output Side Processing . . . reassembly contexts Packet Classification & Route Lookup

FPX Packet Processor Block Diagram SDRAM from LC from SW Data Path Header Proc.

FPX Packet Processor Block Diagram SDRAM from LC from SW Data Path Header Proc. SDRAM Packet Storage Manager (includes free space list) ISAR OSAR Discard Header Pointer Classification and Route Lookup SRAM Control Route & Filter Updates Queue Manager Register Set SRAM Register Set DQ Status & Updates & Status Rate Control Cell Processor 12 - Jonathan Turner – January 27 -29, 2003 to LC to SW

lookup engines. » route lookup for routing datagrams - best prefix » flow filters

lookup engines. » route lookup for routing datagrams - best prefix » flow filters for multicast & reserved flows - exact » general filters (32) for management - exhaustive n Input processing. » parallel check of all three » return highest priority exclusive and highest priority non-exclusive » general filters have unique priority » all flow filters share single priority » ditto for routes 13 - Jonathan Turner – January 27 -29, 2003 Route Lookup Input Demux n Three Flow Filters General Filters headers Result Proc. & Priority Resolution Classification and Route Lookup (CARL) bypass n Output processing. » no route lookup on output Route lookup & flow filters share off-chip SRAM n General filters processed on -chip n

Exact Match Lookup n Exact match lookup table used for reserved flows. » includes

Exact Match Lookup n Exact match lookup table used for reserved flows. » includes LFS, signaled QOS flows and multicast » and, flows requiring processing by SPCs » each of these flows has separate queue in QM » multicast flows have two queues (recycling multicast) » implemented using hashing tag =[src, dst, sport, on-chip SRAM . . . ingress valid egress valid packet src dst 1 1 tag+data 0 1 6 5 1 0 tag+data -simple hash 00 1 1 tag+data -- 14 - Jonathan Turner – January 27 -29, 2003 tag+data -- off-chip SRAM dport, proto] data includes • 2 outputs+2 QIDs • LFS rates • packet, byte counters • flags separate memory areas for ingress and egress packets

General Filter Match n General filter match considers full 5 -tuple » prefix match

General Filter Match n General filter match considers full 5 -tuple » prefix match on source and destination addresses » range match on source and destination ports » exact or wildcard match on protocol » each filter has a priority and may be exclusive or nonexclusive n Intended primarily for management filters. » firewall filters » class-based monitoring » class-based special processing n Implemented using parallel exhaustive search. » limit of 32 filters 15 - Jonathan Turner – January 27 -29, 2003 filter memory matcher

Fast IP Lookup (Eatherton & Dittia) 000 001 010 101 1, 10 -- 11

Fast IP Lookup (Eatherton & Dittia) 000 001 010 101 1, 10 -- 11 100 011 110 * n 0, 00 01 -- address: 101 100 101 000 0110 1110 01, 10 1 110 * 110 101 00 1, 11 0 0 01 0010 0 00 0001 0 00 0000 0 01 0000 1 00 000000001000 00010010 00001100 0000 internal bit vector 1 00 00000000 Multibit trie with clever data encoding. 0 00 1000 0000 0 10 1000 0100 00000000 external bit vector 0 01 0001 0 10 00000000 » small memory requirements (<7 bytes per prefix) » small memory bandwidth, simple lookup yields fast lookup rates » updates have negligible impact on lookup performance n Avoid impact of external memory latency on throughput by interleaving several concurrent lookups. » 8 lookup engine config. uses about 6% of Virtex 2000 E logic cells 16 - Jonathan Turner – January 27 -29, 2003

Lookup Throughput SRAM Bandwidth – 450 MB/s Split tree cuts storage by 30% linear

Lookup Throughput SRAM Bandwidth – 450 MB/s Split tree cuts storage by 30% linear throughput gain 17 - Jonathan Turner – January 27 -29, 2003

Update Performance reasonable update rates have little impact 1 update per ms 18 -

Update Performance reasonable update rates have little impact 1 update per ms 18 - Jonathan Turner – January 27 -29, 2003

Queue Manager Logical View (QM) separate queues for each reserved flow datagram queue to

Queue Manager Logical View (QM) separate queues for each reserved flow datagram queue to output 1 . . . datagram queues 64 hashed datagram queues for traffic isolation 19 - Jonathan Turner – January 27 -29, 2003 VOQ pkt. sched. . arriving packets . . . to link pkt. sched. res. flow queues to output 0 res. flow queues DQ to switch to output 8. . . separate queue for each SPC flow SPC pkt. sched. to SPC from SPC separate queue set for each output.

Backlogged TCP Flows with Tail Discard with large buffers get large delay variance with

Backlogged TCP Flows with Tail Discard with large buffers get large delay variance with small buffers get underflow and low throughput 20 - Jonathan Turner – January 27 -29, 2003

DRR with Discard from Longest Queue n Smaller fluctuations, but still significant. 21 -

DRR with Discard from Longest Queue n Smaller fluctuations, but still significant. 21 - Jonathan Turner – January 27 -29, 2003

Queue State DRR n Add hysteresis to packet discard policy » discard from same

Queue State DRR n Add hysteresis to packet discard policy » discard from same queue until shortest non-empty queue. low variation, even with small queues, low delay, no tuning 22 - Jonathan Turner – January 27 -29, 2003

Packet Scheduling with Approx. Radix Sorting wheel 1 fast forward bits 00110100 wheel 2

Packet Scheduling with Approx. Radix Sorting wheel 1 fast forward bits 00110100 wheel 2 wheel 3 100000101010 output list n To implement virtual time schedulers, need to quickly find the queue whose “lead packet” has the smallest virtual finish time. » using priority queue, this requires O (log n) time for n queues n Use approximate radix sorting, with compensation – O (1). » timing wheels with increasing granularity and range » approximate sorting produces inter-packet timing errors » observe errors & compensate when next packet scheduled n n n Fast-forward bits used to skip to empty slots. Scheduler puts no limit on number of queues. Two copies of data structure needed for approx. version of WF 2 Q+. 23 - Jonathan Turner – January 27 -29, 2003

Resource Usage Estimates n Key resources in Xilinx FPGAs » flip flops - 38,

Resource Usage Estimates n Key resources in Xilinx FPGAs » flip flops - 38, 400 » lookup tables (LUTs) - 38, 400 n each can implement any 4 input Boolean function » block RAMs (4 Kbits each) - 160 24 - Jonathan Turner – January 27 -29, 2003

FPGA Performance Characteristics 25 - Jonathan Turner – January 27 -29, 2003

FPGA Performance Characteristics 25 - Jonathan Turner – January 27 -29, 2003

Summary n Version 1 Hardware status. » hardware operating in lab, passing packets »

Summary n Version 1 Hardware status. » hardware operating in lab, passing packets » but, still have some bugs to correct » one day for typical test-diagnose-correction cycle » version 1 has simplified queue manager n Planning several system demos in next month. » system level throughput testing – focus on lookup proc. » verifying basic fair queueing behavior » TCP SYN attack suppressor SPC-resident plugin monitors new TCP connections going to server n when too many “half-open” connections, oldest are reset n flow filters inserted for stable connections, enabling hw forwarding n Expect to complete version 2 hardware in next six months. 26 - Jonathan Turner – January 27 -29, 2003 n