The Basics of BGP Border Gateway Protocol Routing
The Basics of BGP (Border Gateway Protocol) Routing and its Performance in Today’s Internet Presenter: Sophia Poku Slides taken from presentation by Nina Taft
Outline 1. Highlights 2. Addressing and CIDR 3. BGP Messages and Prefix Attributes 4. BGP Decision and Filtering Processes 5. I-BGP 6. Route Reflectors 7. Multihoming 8. Aggregation 9. Routing Instability 10. BGP Table Growth
Routing Protocols R border router IGP: Interior Gateway Protocol. Examples: IS-IS, OSPF I-BGP internal router R 3 R 2 A AS 1 E-BGP announce B AS 2 R 1 AS 3 AS (Autonomous System) - a collection of routers under the same technical and administrative domain. EGP (External Gateway Protocol) - used between two AS’s to allow them to exchange routing information so that traffic can be forwarded across AS borders. Example: BGP R 5 R 4 B
Routers used n Internal Router: directly connects networks belonging to the same area n It runs a single copy of the basic routing protocol n Border/Boundary Router: exchanges routing information with routers belonging to other AS
Purpose: to share connectivity information you can reach net A via me AS 2 BGP AS 1 R 3 R 2 traffic to A R 1 table at R 1: dest next hop A R 2 A R border router internal router
BGP Sessions n Primary function is to exchange network-reachability information (includes AS #s) n Uses TCP to establish connection n Initially … node advertises ALL routes it wants neighbor to know (could be >50 K routes) n Ongoing … only inform neighbor of changes n One router can participate in many BGP sessions. AS 1 AS 2 AS 3
Configuration and Policy n A BGP node has a notion of which routes to share with its neighbor. It may only advertise a portion of its routing table to a neighbor. n A BGP node does not have to accept every route that it learns from its neighbor. It can selectively accept and reject messages. n What to share with neighbors and what to accept from neighbors is determined by the routing policy, that is specified in a router’s configuration file.
Addressing Schemes n Original addressing schemes (class-based): n 32 bits divided into 2 parts: n Class A n 0 xxx or 1 -126 in decimal; subnet mask: 255. 0. 0. 0 0 8 0 network 0 0 network host 16 n Class B n 10 xx or 128 -192 in decimal n Subnet mask: 255. 0. 0 0 0 network host 24 ~2 million nets 256 hosts n Class C n 110 x or 192 -223 in decimal, Subnet Mask: 255. 0 host
CIDR (Classless Inter-Domain Routing) CIDR introduced to solve 2 problems: exhaustion of IP address space n size and growth rate of routing table n
Problem #1: Lifetime of Address Space n Example: an organization needs 500 addresses. A single class C address not enough (256 hosts). Instead a class B address is allocated. (~64 K hosts) That’s overkill -a huge waste. n CIDR allows networks to be assigned on arbitrary bit boundaries. n permits arbitrary sized masks: 178. 24. 14. 0/23 is valid n requires explicit masks to be passed in routing protocols n CIDR solution for example above: organization is allocated a single /23 address (equivalent of 2 class C’s).
Problem #2: Routing Table Size Without CIDR: 232. 71. 0. 0 232. 71. 1. 0 232. 71. 2. 0 …. . 232. 71. 255. 0 service provider 232. 71. 0. 0 232. 71. 1. 0 232. 71. 2. 0 …. . 232. 71. 255. 0 Global internet With CIDR: 232. 71. 0. 0 232. 71. 1. 0 232. 71. 2. 0 …. . 232. 71. 255. 0 service provider 232. 71. 0. 0/16 Global internet
CIDR: Classless Inter-Domain Routing n Address format <IP address/prefix P>. The prefix denotes the upper P bits of the IP address. n E. g. in CIDR address 206. 13. 01. 48/25, the “/25” indicates the first 25 bits are used to identify a unique network, the remaining bits are host’s n Idea - use aggregation - provide routing for a large number of customers by advertising one common prefix. n This is possible because nature of addressing is hierarchical n Summarizing routing information reduces the size of routing tables, but allows to maintain connectivity. n Aggregation is critical to the scalability and survivability of the Internet
Address Arithmetic: Address Blocks n The <address/prefix> pair defines an address block: n Examples: n n 128. 15. 0. 0/16 => [ 128. 15. 0. 0 - 128. 15. 255 ] 188. 24. 0. 0/13 => [ 188. 24. 0. 0 - 188. 31. 255 ] consider 2 nd octet in binary: 188. 00011000. 0. 0 n Address block sizes n n 13 th bit settable a /13 address block has 232 -13 addresses(=524288) (/16 has 232 -16 =65536) a /13 address block is 8 times as big as a /16 address block because 232 -13 = 232 -16 * 23
CIDR: longest prefix match n Because prefixes of arbitrary length allowed, overlapping prefixes can exist. n Example: router hears 124. 39. 0. 0/16 from one neighbor and 124. 39. 11. 0/24 from another neighbor n Router forwards packet according to most specific forwarding information, called longest prefix match Packet with destination 124. 39. 11. 32 will be forwarded using /24 entry. n Packet w/destination 124. 39. 22. 45 will be forwarded using /16 entry n
Will CIDR work ? n For CIDR to be successful need: n address registries must assign addresses using CIDR strategy n providers and subscribers should configure their networks, and allocate addresses to allow for a maximum amount of aggregation n BGP must be configured to do aggregation as much as possible n Factors that complicate achieving aggregation n multihoming, proxy aggregation, changing providers
Four Basic Messages n Open: Establishes BGP session (uses TCP port #179) n Notification: Report unusual conditions n Update: Inform neighbor of new routes that become active Inform neighbor of old routes that become inactive n Keepalive: Inform neighbor that connection is still viable
BGP Database 1. Neighbor table List of BGP neighbors 2. BGP forwarding table List of all networks learned from each neighbor 3. IP routing table List of best path to destination networks
OPEN Message n During session establishment, two BGP speakers exchange their AS numbers n BGP identifiers (usually one of the router’s IP addresses) n Router ID n Holdtime Open messages are confirmed using a keep-alive message sent by a peer and must be confirmed before updates A BGP speaker has option to refuse a session Select the value of the hold timer: n maximum time to wait to hear something from other end before assuming session is down. authentication information (optional) n n n
NOTIFICATION and KEEPALIVE Messages n NOTIFICATION Indicates an error n terminates the TCP session n gives receiver an indication of why BGP session terminated n Examples: header errors, hold timer expiry, bad peer AS, bad BGP identifier, malformed attribute list, missing required attribute, AS routing loop, etc. n KEEPALIVE n protocol requires some data to be sent periodically. If no UPDATE to send within the specified time period, then send KEEPALIVE message n
UPDATE Message n Updates are sent using TCP to ensure delivery n used to either advertise and/or withdraw unfeasible prefixes from routing table n path attributes: list of attributes that pertain to ALL the prefixes in the Reachability Info field FORMAT: Withdrawn routes length (2 octets) Withdrawn routes (variable length) Total path attributes length (2 octets) Path Attributes (variable length) Reachability Information (variable length)
Advertising a prefix n When a router advertises a prefix to one of its BGP neighbors: information is valid until first router explicitly advertises that the information is no longer valid n BGP does not require routing information to be refreshed n if node A advertises a path for a prefix to node B, then node B can be sure node A is using that path itself to reach the destination. n
BGP Attributes n Attributes: routes learned via BGP have • • • associated properties that are used to determine the best route to a destination when multiple paths exist to a particular destination Local Preference Multi-Exit Discriminator (MED) Origin AS-path Next-hop
Attribute: ORIGIN n ORIGIN: n n Who originated the announcement? Where was a prefix injected into BGP? – indicates how BGP learned about a particular route IGP: route is interior to the originating AS. This value is Value set using network router configuration command to inject router into BGP EGP: route learned via the External Gateway Protocol Incomplete (often used for static routes): origin of routes unknown or learned in some other way
Attributes: AS_PATH n a list of AS’s through which the announcement for a prefix has passed n n each AS prepends its AS # to the AS-PATH attribute when forwarding an announcement useful to detect and prevent loops
Attribute: NEXT HOP n IP address used to reach the advertising router n For EBGP session, NEXT HOP = IP address of neighbor that announced the route. n For IBGP sessions, if route originated inside AS, NEXT HOP = IP address of neighbor that announced the route n For routes originated outside AS, NEXT HOP of EBGP node that learned of route, is carried unaltered into IBGP. 209. 15. 1. 0/ 24 3. 3 D C 2. 2 B IBGP 1. 1 A P G EB 140. 20. 1. 0/24 BGP Table at Router C: IP Routing Table at Router C:
Next-Hop Cont’d n Router C advertises 172. 16. 1. 0 with next hop 10. 1. 1. 1 n A propagates it within its AS
Attribute: Multi-Exit Discriminator (MED) n when AS’s n n interconnected via 2 or more links AS announcing prefix sets MED enables AS 2 to indicate its preference AS receiving prefix uses MED to select link a way to specify how close a prefix is to the link it is announced on
Attribute: Local Preference n Used to prefer an exit point n n from the local AS Used to indicate preference among multiple paths for the same prefix anywhere in the internet. The higher the value the more preferred Exchanged between IBGP peers only. Local to the AS. Often used to select a specific exit point for a particular destination
Routing Process Overview Routes received from peers accept, deny, set preferences Input policy engine Choose best route Decision process BGP table forward, not forward set MEDs Routes used by router IP routing table Output policy engine Routes sent to peers
Input Policy Engine n Inbound filtering controls outbound traffic filters route updates received from other peers n filtering based on IP prefixes, AS_PATH, community n denying a prefix means BGP does not want to reach that prefix via the peer that sent the announcement n accepting a prefix means traffic towards that prefix may be forwarded to the peer that sent the announcement n Attribute Manipulation n sets attributes on accepted routes n example: specify LOCAL_PREF to set priorities among multiple peers that can reach a given destination n
BGP Decision Process 1. Choose route with highest LOCAL-PREF 2. If have more than 1 route, select route with shortest AS-PATH 3. If have more than 1 route, select according to lowest ORIGIN type where IGP < EGP < INCOMPLETE 4. If have more than 1 route, select route with lowest MED value 5. Select min cost path to NEXT HOP using IGP metrics 6. If have multiple internal paths, use BGP Router ID to break tie.
Output Policy Engine n Outbound Filtering controls inbound traffic n forwarding a route means others may choose to reach the prefix through you n not forwarding a route means others must use another router to reach the prefix n may depend upon whether E-BGP or I-BGP peer n example: if ORIGIN=EGP and you are a nontransit AS and BGP peer is non-customer, then don’t forward n Attribute Manipulation
Transit vs. Nontransit AS Transit traffic = traffic whose source and destination are outside the AS Nontransit AS: does not carry transit traffic • Advertise own routes only • Do not propagate routes learned from other AS’s • case 1: r 1 ISP 1 r 3 ISP 2 r 1 r 3 r 2 r 2 r 1 ISP 1 r 3 r 2 r 1 r 3 AS 1 r 3 r 2, r 3 r 2 AS 2 r 2 • Advertises its own routes PLUS routes learned from other AS’s r 1 AS 1 r 1 • case 2: Transit AS: does carry transit traffic r 2 AS 1 ISP 2 r 2, r 1
Internal BGP (I-BGP) n Used to distribute routes learned via EBGP to all the routers within an AS n I-BGP and E-BGP are same protocol in that same message types used n same attributes used n same state machine n BUT use different rules for readvertising prefixes n n Rule #1: prefixes learned from an E-BGP neighbor can be readvertised to an I-BGP neighbor, and vice versa n Rule #2: prefixes learned from an I-BGP neighbor cannot be readvertised to another I-BGP neighbor
I-BGP: Preventing Loops and Setting Attributes n Why rule #2? To prevent announcements from looping. In E-BGP can detect via AS-PATH. n AS-PATH not changed in I-BGP n n Implication of rule: a full mesh of I-BGP sessions between each pair of routers in an AS is required n Setting Attributes: The router that injects the route into the I-BGP mesh is responsible for setting the LOCAL-PREF attribute n prepending AS # to AS-PATH n
Route Reflectors n Problem: requiring a full mesh of I-BGP sessions between all pairs of routers is hard to manage for large AS’s. n Solution: n group routers into clusters. n Assign a leader to each cluster, called a route reflector (RR). n Members of a cluster are called clients of the RR n I-BGP Peering n n clients peer only with their RR RR’s must be fully meshed RR RR RR A B C clients clusters
Route Reflectors: Rule on Announcements n Provides mechanisms for minimizing the # of updates n n messages transmitted within an AS and reducing the amount of data propagated in each message. If received from RR, reflect to clients If received from a client, reflect to RRs and clients If received from E-BGP, reflect to all - RRs and clients RR’s reflect only the best route to a given prefix, not all announcements they receive. n n helps size of routing table sometimes clients don’t need to carry full table
Avoiding Loops with Route Reflectors n Loops cannot be detected by traditional approach using AS-PATH because AS-PATH not modified within an AS. n Announcements could leave a cluster and re-enter it. n Two new attributes introduced: n ORIGINATOR_ID: router id of route’s originator in AS rule: announcement discarded if returns to originator n CLUSTER_LIST: a sequence of cluster id’s. set by RRs. rule: if an RR receives an update and the cluster list contains its cluster id, then update is discarded.
Multihoming
Single-homed vs. Multi-homed subscribers n A single-homed network has one connection to the Internet (i. e. , to networks outside its domain) n A multi-homed network has multiple connections to the Internet. Two scenarios: can be multi-homed to a single provider n can be multi-homed to multiple providers n n Why multi-home? n Reliability n Performance
Single-homed AS n Subscriber called a “stub AS” n Provider-Subscriber communication for route advertisement: n n static configuration. most common. n Provider’s router is configured with subscriber’s prefix. n good if customer’s routes can be represented by small set of aggregate routes n bad if customer has many noncontiguous subnets can use BGP between provider and stub AS Provider R 1 R 2 Subscriber
Multihoming to Multiple Providers ISP 3 ISP 2 ISP 1 Customer
Multihoming Issues n Load sharing n how distribute the traffic over the multiple links? n Reliability n if load sharing leads to preferencing certain links for certain subnets, is reliability reduced? n Address/Aggregation n which subnet addresses should the multihomed customer use? n how will this affect its provider’s ability to aggregate routes?
Load sharing from ISP to Customer using attributes ISP n Goal: provider splits traffic across 2 links according to prefix n Implement this strategy using attributes n n customer sets MEDs provider sets LOCAL_PREF R 1 140. 35/16 208. 22/16 R 2 R 3 Customer 208. 22/16 140. 35/16
Load sharing from Customer to ISP using policy blue: announcements red: traffic n Goal: send traffic to ISP’s customers on one link; send traffic to the rest of the Internet on 2 nd link n Implement using policy to control announcements ISP R 1 R 2 advertise customer routes only advertise default route 0/0 R 3 traffic Customer
Address/Aggregation Issue n Where should the customer get its address block from? n n 1. From ISP 1 2. From ISP 2 3. From both ISP 1 and ISP 2 4. Independently from address registry (cases 1 and 2 are equivalent) ISP 3 ISP 1 ISP 2 customer
Case 1 & 2: Get address block from one ISP n example: customer gets n n address from ISP 1’s aggregation is not broken customer’s prefix not aggregatable at ISP 2 longer prefix becomes a traffic magnet How good is load sharing? If all ISP’s generate same amount of traffic for customer, then ISP 2 -customer link twice as loaded as ISP 1 -customer link ISP 3 140. 20/16 200. 50/16 140. 20. 6/24 ISP 1 ISP 2 140. 20/16 200. 50/16 140. 20. 6/24 customer 140. 20. 6/24
Case 3: Get address block from both ISPs n announcement policy: announce prefix only to its “parent” n advantage: both ISP’s can aggregate the prefix they receive n disadvantage: lose reliability n load balancing good? depends upon how much traffic sent to each prefix ISP 3 ISP 1 ISP 2 140. 20/16 200. 50/16 140. 20. 1/24 200. 50. 1/24 customer 140. 20. 1/24 200. 50. 1/24
Case 4: obtain address block from registry n no aggregation possible n no traffic magnets created n good reliability n how to achieve load sharing? n n customer breaks address block into 2 /25 blocks, and announce one per link (but may lose reliability) OR use method of AS-PATH manipulation ISP 3 100. 20/16 150. 55. 10/24 200. 50/16 150. 55. 10/24 ISP 1 ISP 2 100. 20/16 200. 50/16 150. 55. 10/24 customer 150. 55. 10/24
AS-PATH manipulation n Idea: prepend your AS ISP 3 number in AS-PATH multiple times to discourage use of a link n makes AS-PATH seem 150. 55. 10/24 - 1 33 33 longer than it is n recall BGP decision AS 1 process uses shortest ASPATH length as a criteria for selecting best path 150. 55. 10/24 - 33 33 n Example: ISP 3 will choose customer path through AS 2 150. 55. 10/24 AS 33 because its AS-PATH appears shorter 150. 55. 10/24 - 2 33 AS 2 150. 55. 10/24 - 33
Aggregation
Address Arithmetic: When is aggregation possible? Case 1 n Process of combining characteristics of several different routes in such a way that a single route is advertised. n Possible when one prefix contained in another. n Example: Two AS’s having customer-provider relationship. Provider does the aggregation. n Provider has address block 18. 0. 0. 0/8 n Its customers have address blocks 18. 6. 0. 0/15 and 18. 9. 0. 0/15 n Provider announces its address block only n Rule: Prefix p 1 contains prefix p 2 iff length(p 2) > length(p 1) AND address(p 2) / 232 -length(p 1) = address(p 1) / 232 -length(p 1)
Address Arithmetic: When is aggregation possible? Case 2 n Some pairs of consecutive prefixes n Example: routes within the same AS: AS has 2 address blocks: 1. 2. 2. 0/24 = 00000010. 0000/24 1. 2. 3. 0/24 = 00000010. 00000011. 0000/24 can announce instead 1. 2. 2. 0/23 n Rule: consecutive prefixes p 1 and p 2 are aggregatable iff length(p 1)=length(p 2) AND address(p 1) / 232 -length(p 1) +1 = address(p 2) / 232 -length(p 2) AND address(p 1) % 233 -length(p 1) = 0
Black holes and cardinal sins NAP n “The cardinal sin of BGP routing is advertising routes wrong !! 100. 24. 0. 0/18 that you don't know how to 100. 24/16 get to; called "black-holing" 100. 24. 56. 0/21 someone” [100. 24. 56. 0 -100. 24. 63. 255] 222. 2/16 n If you announce part of IP Net C space owned by someone else, using a more-specific ISP 1 prefix, all their traffic flows to ISP 2 100. 24/16 222. 2/16 you. Makes that address block disconnected from the Internet. 100. 24. 0. 0/20 n Example: black holes can 100. 24. 16. 0/21 [100. 24. 0. 0 -100. 24. 15. 255] happen inadvertently by [100. 24. 16. 0 -100. 24. 23. 255] Net B non-careful aggregation Net A
Limitations to Aggregation n Lack of hierarchical allocation of address space prior to CIDR (before 1995) n A single AS can have noncontiguous address blocks n Customer AS and provider AS can have noncontiguous address blocks n Reluctance of customers to renumber their address space when they switch providers n Multi-homing multi-homed prefixes require global visibility n may choose not to: load sharing n
Rules of Thumb for Aggregation n To avoid black holes: an ISP is not allowed to aggregate routes unless it is a supernet of the address block it wants to aggregate n In other words, an ISP has to specifically announce each IP address range of its downstream customers that are not part of its own address space.
Routing Instability
Route Stability n Routing instability: rapid fluctuation of network reachability information n route flapping: when a route is withdrawn and re-announced repeatedly in a short period of time n happens via UPDATE messages n because messages propagate to global Internet, route flapping behavior can cascade and deteriorate routing performance in many places n Effects: increased packet loss, increased network latency, CPU overhead, loss of connectivity n Route Stability Studies by C. Labovitz, R. Malan & F. Jahanian
Types of Routing Updates n Forwarding instability reflects legitimate topology changes n e. g. , changes in Prefix, NEXT_HOP and/or ASPATH n affects forwarding paths used n Policy fluctuation n reflects changes in policy n e. g. , changes in MED, LOCAL_PREF, etc. n may not necessarily affect forwarding paths used n Pathological n redundant messages n reflect neither topology nor policy changes n
Who’s Responsible? n AS’s n No single AS dominates instability statistics n No correlation between the size of an AS and its share of updates generated. n Prefixes n Instability is evenly distributed across routes. n Example of measurements: n n 75% of AADiff events come from prefixes change less than 10 times a day. 80 -90% of instability comes from prefixes that are announced less than 50 times/day.
Sources of Instabilities in General n router configuration errors n transient physical and data link problems n software bugs n problems with leased lines (electrical timing issues that cause false alarms of disconnect) n router failures n network upgrades and maintenance
Instability Problem and Cause. Example 1. n Problem: 3 -5 million duplicate withdrawals n Cause: stateless BGP implementation time-space tradeoff: no state maintained on what advertised to peers n when receive any change, transmit withdrawal to all peers regardless of whether previously notified or not n sent updates for both explicit and implicit withdrawals n By 1998, most vendors had BGP implementations with partial state. n Result: number of WWDups reduced by an order of magnitude n
Instability Problem and Cause. Example 2 n Problem: duplicate announcements n Cause: min-advertisement timer & stateless BGP min-adv timer: wait 30 seconds. Combine all received updates in last 30 seconds into single outbound update message (if possible). n within 30 seconds route can be withdrawn and re-announced so that there is no net change to original announcement n n Solution: Have BGP keep some state about recently sent messages to peers. Avoid sending duplicate messages
Instability Problem and Cause. Example 3 AS 1 R 2 Net X M ED R 8 =1 R 1 5 R 3 4 M ED = =5 24 R 4 AS 3 R 6 3 R 7 6 5 10 2 R 5 AS 2 R Example: interaction IGP/BGP policy: set MED using IGP metrics, such as shortest path border router internal router E-BGP IGP
Controlling route instability: Route Dampening n track number of times a route has flapped over a period of time n routes that exhibit a high level of instability in a short period of time should be suppressed (not advertised) n penalize ill behaved routes proportionally to their expected future stability n if a suppressed route stops flapping for a long enough period of time, unsuppress it (readvertise)
Route Dampening penalty suppress limit reuse limit time
Route Dampening Algorithm n Each time a route flaps, increase the penalty n If the route has not flapped in the last ‘half-life’ time units, then cut penalty in half. n If the penalty > suppress limit, then suppress the route n If the penalty < reuse limit, then free a suppressed route
Side Effect of Route Dampening n A legitimate update may arrive and it will be ignored because that route has been suppressed and not yet released. n The modification needed for the legitimate announcement is delayed
Aggregation can help route flapping n If a more-specific route is flapping, but provider only announces aggregated prefix, then not flapping other networks don’t see route flap. n Hence aggregation can flapping mask route flapping. n Aggregation helps combat instability because it reduces the number of networks visible in the core Internet. Tier 1 provider 140. 40/16 Tier 2 provider 140. 4/24 customer
- Slides: 69