Broadband Network Architectures Router Design z TEMangir Sp
Broadband Network Architectures Router Design z. TEMangir Sp 02 Routing 1
Outline § § § Introduction Router Fundamentals Routing Algorithms and Protocols Fast Forwarding Layer-3 Switching IP over WDM TEM 497 2
Introduction Routing 3
A Fine Distinction § Imprecision surrounds the terms “routing” and “forwarding” § Forwarding is the act of transferring a packet from one interface of a router to another, after consulting a forwarding table § Routing is the act building routing tables by means of a routing algorithm § We frequently abuse this convention TEM 497 4
What is a Router? § A packet forwarder § Multiprotocol – IP, IPX, Apple. Talk § A routing-protocol execution machine § Multiprotocol – IGRP, RIP, OSPF, IS-IS § § A A TEM 497 packet monitor general-purpose computer firewall switch 5
Internet Forwarder Functions § § § § Parse the datagram header Checksum actions Select the network protocol Decrement the TTL field Use the TOS field to prioritize the datagram Process the options fields Forward (route) the datagram to next hop Fragment the datagram TEM 497 6
Internet Router Functions § Execute one or more routing protocols § Exchange state information with other routers § Use a transport protocol § Authentication § Collect network-management statistics § Packet counts, lengths, and types § Source-destination matrix § Configuration support § User interface § Tunnel management TEM 497 7
Internet Firewall Functions § Filtering of destinations § Source § Destination § Filtering of services § Block protocols § Block transport numbers § Virtual private networks FTP HTTP X Port Nums UDP TCP Proto ID IP § Encrypted tunnels TEM 497 8
Control and Data Planes Control Plane control packets to & from other control plane entities data packets to & from other data plane entities Route Determination Function Data Forwarding Function control packets to & from other control plane entities data packets to & from other data plane entities Data Plane Router TEM 497 9
Router Fundamentals Routing 10
ARP § Address Resolution Protocol translates an IP address to a media (link) address § Simple request-response protocol § First host broadcasts a request packet containing desired IP address § Second host recognizes its IP address § Second host sends a response packet to first host containing its media (link) address § First host caches address mapping for later use TEM 497 11
ARP Header 0 15 Hardware Type HLen PLen 31 Protocol Type Operation Source Hardware Address Source Protocol Address Target Hardware Address Target Protocol Address TEM 497 12
ARP Header Fields § Hardware type: e. g. Ethernet = 1 § Protocol type: e. g. IPv 4 = 0080 § HLen: Hardware address length (e. g. Ethernet = 48 bits) § PLen: Protocol address length (e. g. IPv 4 = 32 bits) § Operation: a query (0) or a reply (1) § Source: where packet came from § Target: system it is querying about TEM 497 13
ARP Operation (1) DNS FTP (8) TCP (8) (1) (2) IP (8) ARP (7) Ethernet Driver ARP (3) IP (8) Ethernet Driver ARP (6) (5) Ethernet Driver (4) TEM 497 14
ARP Operation (2) 1. IP datagram with destination address 2. Next-hop address is passed to ARP 3. ARP request passed to Ethernet driver 4. ARP request broadcast in Ethernet frame Routing ARP request recognized by next-hop node 6. ARP reply sent by next-hop node 7. ARP reply updates ARP cache 8. IP datagram sent through next-hop node TEM 497 15
Proxy ARP § Allows a router to answer ARP requests from one of its networks for a host on another of its networks § Router substitutes its link address for the responding host’s § Proxy gives the illusion that the host is connected to another network TEM 497 16
RARP § Reverse ARP translates a media (link) address to an IP address § Used by system without nonvolatile storage § Requires a network-wide RARP server § Similar to BOOTP (Bootstrap Protocol) TEM 497 17
Router Advertisement (1) § Routers announce presence by broadcasting ICMP router advertisements § All-hosts multicast address: 224. 0. 0. 1 § Limited broadcast address: Routing Advertisements are periodic § 7 -minute period § Advertisement becomes stale after 30 minutes TEM 497 18
Router Advertisement (2) § Advertisements contain a list of addresses § Router IP addresses § Preference level of each address § Higher values are preferred § Highest value is the normal router § Lower value is a backup router § Lowest values do not wish to receive default traffic TEM 497 19
Router Solicitation (1) § A host should not have to wait 7 minutes for the next ICMP router advertisement § ICMP router solicitation messages allow the host to request the identity of a router § The host broadcasts the solicitation § All-routers multicast address: 224. 0. 0. 2 § Limited broadcast address: 255 § The host receives many advertisements § The host chooses the router on its subnet TEM 497 20
Router Solicitation (2) § Host bootstrap operation § Broadcasts 3 solicitations § Broadcasts 1 message every 3 seconds § Broadcasting stops as soon as a valid router advertisement is received TEM 497 21
Broadcast Storms § Mechanisms that rely on broadcasting messages within a LAN are vulnerable to broadcast storms, i. e. long, uncontrolled exchanges of broadcast packets. § Because everyone must process a broadcast, storms put a heavy load on uninvolved nodes. § Therefore, protocol exchanges – such as ARP, RARP, DHCP, Router Solicitation, and Router Announcement – must control broadcasts with timers and by limiting message counts. TEM 497 22
Redirect § ICMP redirect error is sent by a router to a host to indicate that the host should send its datagrams through another router 1. First Datagram 4. Successive Datagrams 2. Redirect 3. First Datagram Security concern! TEM 497 23
A Simple Router I/O Bus CPU DMA Ctrl 3 2 System Bus 1 DMA Xfer Main Memory 1. Packet input 2. Header processing Routing table lookup DMA transaction TEM 497 3. Packet output NIC Fast Ethernet NIC FDDI NIC ATM NIC = Network Interface Controller DMA = Direct Memory Access 24
IP-Layer Processing Routing Algorithm Routing Table Mgmt UDP TCP Yes ICMP Data Control Routing Table IP Output Calculate Next Hop IP Layer TEM 497 Network Output(s) No Addressed Here? Forwarded Packet Source Routed Packet Process IP Options IP Input Queue Network Input(s) 25
Routing Table Structure § Destination IPv 4 address § Host address (32 bits) § Network address (<32 bits) § Next-hop router IP address § Router on a directly connected network § Flags § Network or host § Router or interface § Network interface TEM 497 26
Routing Table zap % netstat -rn Routing tables Destination 128. 9. 192. 24 128. 9. 192. 72 128. 9. 192. 73 224. 0. 0. 9 127. 0. 0. 1 128. 9. 192. 146 128. 9. 192. 100 128. 9. 192. 69 128. 9. 192. 126 default 128. 9. 192. 0 128. 9. 112. 0 Host address Multicast address Loopback address Next-hop router Gateway 128. 9. 112. 24 128. 9. 112. 72 128. 9. 112. 73 127. 0. 0. 1 128. 9. 112. 146 128. 9. 112. 100 128. 9. 112. 69 128. 9. 112. 126 128. 9. 112. 72 128. 9. 192. 151 128. 9. 112. 151 Network address TEM 497 Flags UGH UGH UH UH UGH UGH UG U U U = route is up G = route is via gateway H = route is to a host D = route was redirected Refcnt 0 9 0 1 8 0 0 22 7 0 Use 0 54173 0 118606 3541986 0 0 8601210 2109258 51 Interface myri 0 lo 0 myri 0 myri 0 le 0 myri 0 Ethernet Loopback Myrinet 27
IP Output Processing § Search table for match of host address § If found, then send datagram to next-hop router or directly connected interface § Search table for match of network address § If found, then send datagram to next-hop router or directly connected interface § Use subnet mask, if necessary § Search table for default entry § If found, then send the datagram to next-hop router TEM 497 28
Routing § Assumptions § Router knows the addresses of all other routers § Router knows the “costs” to reach its neighbors § Network viewed as a collection of nodes and (bidirectional) links § From any given router find next hop on shortest path to any other router § Tolerance of failures TEM 497 29
Distance-Vector Routing § Based on the sharing of distance vectors § A router’s distance vector is a list of its “distances” to every other router in the routing domain § Router tells its neighbors its distance (cost) to every other router in the network § Cost = Distance § Usually we assume that cost = distance = hops § Other metrics: bandwidth, delay, charging TEM 497 30
Distance-Vector Algorithm § Router maintains a distance vector § List of <dest, cost> entries § Router periodically sends a copy of its distance vector to all neighboring routers § Upon receipt of a distance vector, the router determines its new distance vector § cost(v) min {cost(v), costw(v)+cost(w)} § Converges to shortest-path routes § O(MN), M=num_links, N=num_nodes TEM 497 31
Distance-Vector Problems § Slow convergence § Packet bouncing after link failure § Counting to infinity § Race condition after network partition § Algorithm keeps adding to current cost, never reaching infinity § Solution: represent infinity by a large number § Large number is 16 in RIP § Caused by routers repeating information that was valid before failures TEM 497 32
Link-State Routing § Based on sharing of link state § Link-state packets: <ID, Nbr_ID, cost> § Link-state information is flooded throughout the network § Each router computes shortest paths independently § Router tells every other router its distance (cost) to its neighbors § Cost = distance = hops TEM 497 33
Link-State Algorithm § Router maintains a database of link-state packets that describe its links § Router floods a copy of every link-state packet throughout the network § Uses sequence numbers and duplicate elimination to control the flood § Router applies Dijkstra algorithm to find shortest path § Converges to shortest-path routes § O(M log. M), M = num_links TEM 497 34
LS LS LS Router All Other Routers LS Router’s Neighbors DV Router DV DV Two Routing Schemes Distance Vector Routing Link State Routing Router sends a large amount of information to a few recipients Router sends a small amount of information to many recipients TEM 497 35
Link-State & Distance. Vector Routing § Link-state § Loopless routing § Fast convergence § Precise, multiple metrics (costs) § Distance-vector § Simplicity § Less memory required § Both in use in today’s Internet TEM 497 36
Internet Routing Hierarchy § Interior routing § Within an AS § Intradomain routing § Exterior routing § Between ASs § Interdomain routing TEM 497 37
Internet Routing Protocols § Interior Gateway Protocols (IGPs) § § § RIPv 2 is the current standard IGRP EIGRP OSPF IS-IS § Exterior Gateway Protocol (EGP) § Border Gateway Protocol (BGP) § BGP-4 is the current standard TEM 497 38
Routing Protocol Comparison Routing Protocol Supported Protocols Strengths Enhanced IGRP IP, IPX, Apple. Talk load balancing, metrics IGRP IP, OSI-IP RIPv 2 IP simplicity improved convergence OSPF IP rapid convergence complexity IS-IS IP, OSI-IP RIP IP simplicity count to TEM 497 Limitations 39
IGP Example 128. 9. 1. 2 Rtr A Rtr B s 1 e 0 128. 9. 2. 0/24 (2000) s 2 128. 9. 3. 0/24 (60). 2 128. 9. Routing 2 . 2 128. 9. 4. 0/24 (60) Rtr C 128. 9. 1. 0/24 (10) 128. 9. Routing 0/24 (10) 128. 9. 6. 0/24 (10) RIP Routing Table at Rtr A Destination 129. 9. 1. 0 128. 9. 2. 0 128. 9. 3. 0 Next Hop e 0 s 1 s 2 128. 9. 2. 2 (s 1) 128. 9. 4. 0 128. 9. 3. 2 (s 2) 128. 9. Routing 0128. 9. 2. 2 (s 1) 128. 9. 6. 0 128. 9. 3. 2 (s 2) TEM 497 Hop Count 1 1 OSPF Routing Table at Rtr A 128. 9. 6. 2 Destination 129. 9. 1. 0 128. 9. 2. 0 128. 9. 3. 0 Next Hop e 0 s 1 s 2 Hop Count - 128. 9. 4. 0 128. 9. 3. 2 (s 2) 120 128. 9. Routing 0128. 9. 2. 2 (s 1) 128. 9. 6. 0 128. 9. 3. 2 (s 2) 130 70 40
Lollipop Sequence Space Problem: Sequence numbers of link-state packets wrap around or are restarted a -N/2 0 d N/2 - 1 Sequence numbers start here (bootup) and circle around repeatedly TEM 497 b If d<N/4 (half circumference) then b is the newer sequence number, otherwise a is newer Sequence numbers in this subspace are generated only after bootup, and recipients notify the booting router of last sequence number received 41
Routing in the Internet § Autonomous System (AS) § Set of routers and hosts administered by a single entity § Customer network (e. g. , 128. 9. 0. 0) § ISP § Backbone provider § Assigned a unique 16 -bit number § AS represents a routing domain TEM 497 42
Classification of ASs (1) § Stub AS § Single connection to another AS § All traffic is local (i. e. , originates or terminates at the AS) § E. g. , a typical corporation § Multihomed AS § Multiple connections to other ASs § Refuses to carry nonlocal (transit) traffic § E. g. , a well-connected corporation TEM 497 43
Classification of ASs (2) § Transit AS § Multiple connections to other ASs § Accepts local and nonlocal (transit) traffic § E. g. , ISP or backbone operator TEM 497 44
Types of ASs AS 4 (stub) AS 2 (transit) AS 1 (transit) AS 5 (stub) AS 6 (multihomed) TEM 497 AS 3 (transit) 45
First 20 AS Numbers AS Number Name Handle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 GNTY-1 DCN-AS MIT-GATEWAYS ISI-AS SYMBOLICS [SG 52 -ARIN] BULL-HN UK-MOD RICE-AS CMU-ROUTER CSNET-EXT-AS HARVARD NYU-DOMAIN BRL-AS COLUMBIA-GW NET-DYNAMICS-EXP LBL PURDUE UTEXAS CSS-DOMAIN UR [CS 15 -ARIN] [DW 19 -ARIN] [RH 164 -ARIN] [JKR 1 -ARIN] [JLM 23 -ARIN] [RNM 1 -ARIN] [RUH-ORG-ARIN] [HC-ORG-ARIN] [CS 15 -ARIN] [WJO 3 -ARIN] [ZN 68 -ARIN] [RR 33 -ARIN] [ZC 26 -ARIN] [ZSU-ARIN] [CAL 3 -ARIN] [JRS 8 -ARIN] [DLN 12 -ARIN] [CR 11 -ARIN] [LB 16 -ARIN] http: //www. arin. net/library/internet_info/asn. txt TEM 497 46
CIDR — Problems § Classless Interdomain Routing (CIDR) § Class A IP addresses are too large (16 M hosts) § Class C IP addresses are too small (256 hosts) § Class B addresses are just right (64 K hosts), but we are running out of class B addresses § Routing table explosion § Core routers act upon network numbers § Routing tables grow as number of networks increases TEM 497 47
CIDR — Solutions § Allocate the class C address space among geographical regions § Europe, the Americas, Asia, Africa § Eases routing problems § Assign blocks of class C addresses to users § Can attach more than 256 hosts § Allows for the aggregation of routes TEM 497 48
CIDR — Rules § User may ask for 2 n contiguous class C address blocks (0 n 5) § Yields 2 n+8 host addresses § A block of class C addresses is listed in a core routing table by address prefix § Like a subnet mask § E. g. , the prefix 192. 4. 16. 0/20 specifies network numbers 192. 4. 16. 0 through 192. 4. 31. 255 TEM 497 49
CIDR Aggregation Routing Table 192. 4. 16. 0/20 One routing prefix replaces 4096 entries 4096 Customer Addresses 192. 4. 16. 0 - 192. 4. 31. 255 Customer Backbone Provider ISP “ 192. 4. 16. 0/20” is shorthand notation for “ 192. 4. 16. 0 - 192. 4. 31. 255” TEM 497 50
CIDR Block Allocations 194. 0. 0. 0 198. 0. 0. 0 200. 0 202. 0. 0. 0 fewer fewer TEM 497 – 195. 255: Europe - 199. 255: North America - 201. 255 : Central and South America - 203. 255 : Asia and the Pacific than than 256 addresses: 512 addresses: 1024 addresses: 2048 addresses: 4096 addresses: 8192 addresses: 16384 addresses: 1 class C network 2 class C networks 4 class C networks 8 class C networks 16 class C networks 32 class C networks 64 class C networks 51
Network Address Translation § A form of IP masquerading § Used when a large customer network can obtain only a small IP address allocation § For example, a corporation with thousands of hosts receives only a class C address space § Private network address space used internally § 10. 0/8 § 172. 16. 0. 0/12 § 192. 168. 0. 0/16 TEM 497 52
User Tools for Routing § netstat § Unix and MS-DOS § Display routing table with -rn § arp § Unix and MS-DOS § Examine or modify the ARP cache § ifconfig § Unix § Report details of network interfaces with -a TEM 497 53
Evolution of Router Design § Generation 1: shared backplane and shared buffer memory § Generation 2: shared backplane and local buffer memory § Generation 3: switched backplane and local buffer memory § Generation 4: clusters of routers TEM 497 54
Generation 1 PACKET BUFFERS CPU BACKPLANE BUS LINK INTERFACE CARDS TEM 497 DMA DMA MAC MAC 55
Generation 2 CPU BACKPLANE BUS DMA LINK INTERFACE CARDS PACKET BUFFERS MAC TEM 497 DMA PACKET BUFFERS MAC 56
Generation 3 CPU SWITCH DMA LINK INTERFACE CARDS PACKET BUFFERS MAC TEM 497 DMA PACKET BUFFERS MAC 57
Generation 4 LINK INTERFACES R O U T E R FAST INTERCONNECT TEM 497 58
Fast Forwarding Routing 59
Cisco Forwarding Performance Switching Path Cisco 2500 Cisco 4500 Cisco 7000 Cisco 7500 Process 1000 pps 10, 000 pps 2500 pps 10, 000 pps Fast 6000 pps 45, 000 pps 30000 pps 150, 000 pps 271, 000 pps 275, 000 pps Hardware TEM 497 N/A 60
Cisco Performance Notes § Process § Fast § Hardware TEM 497 61
Importance of Lookups § The routing table must have an entry for every possible Internet address § Routing-table size has grown steadily § The problem is to match the destination address of an incoming packet to a routing-table entry in a small amount of time § Entry is usually an aggregated prefix § Best (longest) prefix match TEM 497 62
Routing Table Growth TEM 497 www. telstra. net/ops/bgptable. html 63
Address Lookup § Router must be able to look up all assigned IPv 4 addresses § Millions of addresses are assigned § There is not enough high-speed memory to store all assigned IPv 4 (and IPv 6) addresses § We must aggregate addresses to compress the routing table as much as possible TEM 497 64
Address Aggregation Address Interface 128. 9. 160. 38 8 128. 9. 0. 0/16 8 128. 9. 191. 7 8 128. 0. 0. 0/1 4 154. 23. 16. 134 4 128. 0. 0. 0/6 1 194. 47. 10. 72 4 171. 9. 0. 0/16 5 128. 12 Routing 50. 89 1 193. 0. 0. 0/4 3 130. 39. 213. 66 1 193. 9. 14. 0/24 5 171. 9. 160. 38 5 193. 77. 50. 7 3 193. 9. 14. 38 5 202. 197. 160. 67 3 Compressed Table Original Table TEM 497 65
A Simple Scheme § In IPv 4 at most only the first 24 bits are used by core routers § Those bits specify the network number toward which the packet is headed § Given a fast random-access memory of 224 locations (16 Mword), we can store the next hop of net address x. y. z. * in memory location x. y. z § Only one memory access per lookup is needed TEM 497 66
Updating Routing Tables § Compressed routing tables must be updated periodically § New information about routes can affect address aggregation § The compression effort can be significant § Compression must be computationally efficient TEM 497 67
Hash Tables for Fast Address Lookup Length 8 TEM 497 Hash Lists of Prefixes 10 12 10. 128, 10. 64 16 10. 1, 10. 2 24 10. 1. 1, 10. 1. 2, 10. 2. 1 68
Level-1 Lookup Scheme IP Address 31 10 2 4 0 16 0 bix bit 1 15 0 ix code[4 K] six ten 675 base[1 K] maptable[676] pointer = + TEM 497 + 69
Level-2/3 Lookup § Level-1 pointer points to either: § Next hop, or § Indicator to continue search at levels 2/3 § Levels 2/3 use the lower 16 bits of the address to look up the next hop TEM 497 70
Performance of Scheme § Data structures fit in data cache memory § Fewer than 100 instructions per address are required for lookup § Therefore, can forward several million packets per second through a conventional CPU-based router TEM 497 71
Layer-3 Switching Routing 72
Tag Switching § Sometimes called layer-3 or IP switching § Combines a switch with a router § Fast switch § Slower router § Attempts to detour around the slow routing path by taking a fast switching path TEM 497 73
Observations (ca. 1997) § Routers are expensive and slow § $187, 000 for 1 -Gb/s router § Switches are cheap and fast § $41, 000 for 5 -Gb/s switch § It costs 20 times as much to route a bit as to switch it TEM 497 74
IP Flows § A flow is a stream of IP packets that follow the same route for several hops § Common flow types § Streams from a specific source address to a specific destination address § Streams from a specific source address/port to a specific destination address/port § Flows have limited lifetimes § Analogous to a VC TEM 497 75
Flow Classification § Flows should be long-lived § Disregard DNS packets § Disregard ICMP packets (e. g. , ping) § Disregard most HTTP packets § Flows should be high-throughput § Disregard Telnet sessions § Detect a flow if the number of packets received in a specified time interval exceeds a threshold TEM 497 76
Flow Statistics § Count packets and flows over a period of time § Flow is defined by IP source and destination addresses § Measure the duration of each flow § Count the number of packets in each flow TEM 497 77
Flow Statistics Illustrated 100 PERCENTAGE FLOWS 50 PACKETS 0 0 50 100 150 200 250 300 FLOW DURATION (seconds) TEM 497 78
Flows and Packet Traces TEM 497 79
Flow Classifier § X/Y flow classifier § Flow recognition by stream characteristics § X packets § Y seconds § Flow is declared switchable § Flow deletion by stream characteristics § W packets § Z seconds § Flow is declared unswitchable § Analogous to calculating first derivative df/dt TEM 497 80
Basic Tag-Switch Strategy § Determine whether a flow exists § Use normal hop-by-hop IP forwarding for short-lived flows § Use “short-cut” ATM switching for longlived, high-throughput flows TEM 497 81
Tag-Switch Architecture Remove from ATM switch ØSignaling ØLANE ØMPOA ØIS-IS routing IP Switch Controller Add to ATM switch ØFlow management protocol ØSwitch management protocol ØFlow classifier ATM Switch Claim: added software is 10% the size of removed software! TEM 497 82
Default Mode Controller ØIP flow is initially forwarded ØTwo default VCs are used Downstream Upstream Switch TEM 497 83
Flows Detected ØController detects a flow ØInstructs upstream switch to use a new VC ØUpstream flow is now labelled by a new VC Controller ØDownstream controller detects a flow ØInstructs this switch to use a new VC ØDownstream flow is now labelled by a new VC Downstream Upstream Switch TEM 497 84
Cut-Through Action ØController directs the switch to reconfigure ØTwo VCs are joined into one VC ØFlow is now carried at switching speeds ØPeriodic messages to maintain new VC ØTimeout of inactive flows Upstream Controller Downstream Switch TEM 497 85
Features of Tag Switching § IP header and LLC/SNAP encapsulation header can be removed § Compression benefits throughput § Added back later at the exit tag switch § TTL is adjusted at the exit tag switch § Preserve the value that it would have had in default mode § Update the IP checksum too § Avoids mismatches in TTL for a flow TEM 497 86
Tag-Switching Performance § Analysis based on San Francisco NAP packet traces § Evaluate switching gain, i. e. the fraction of all packets that are directly switched § Simulations of Ipsilon IP switch § 86% of packets are switched § 92% of bytes are switched § Switching gain is maximized at a detection threshold of about 10 packets TEM 497 87
Layer-3 Switching § Data-driven approaches: use only packet statistics § Ipsilon IP Switching § Cisco Tag Switching § Topology-driven approaches: use routingtable or other topological information § IBM ARIS § Multiprotocol Label Switching (MPLS) TEM 497 88
MPLS § Multiprotocol Label Switching (MPLS) § Generalized MPLS (GMPLS) TEM 497 89
IP over WDM § Place IP flows on their own lightpaths § Lightpath is formed by the concatenation of wavelengths § Lightpath is all-optical § Idea is similar to IP switching § Wavelength-selective crossconnect (vs. ATM cell switch) § There are only a few wavlelengths to carry flows (vs. many ATM virtual channels) § A signaling protocol is required to set up lightpaths TEM 497 90
- Slides: 90