Practical use of Ethernet OAM Joerg Ammon jammonbrocade
Practical use of Ethernet OAM Joerg Ammon (jammon@brocade. com) Systems Engineer Service Provider May 2011 © 2011 Brocade Communications Systems, Inc. Company Proprietary Information 1
Overview • A variety of Operations, Administration, and Management (OAM) protocols and tools were developed in recent years for MPLS, IP, and Ethernet networks. • These tools provide unparalleled power for an operator to proactively manage networks and customer Service Level Agreements (SLAs). • This session reviews the various OAM tools available in MPLS/IP/ Ethernet networks at various layers of the stack and recommends/reviews best practices for choosing the right OAM protocol to use in a network. © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 2
OAM Tools Scope of this presentation Management Plane (NMS, EMS) OAM&P Network Plane (Network Elements) Scope of this presentation: OAM tools across network elements Scope of this presentation is within network plane only (not management plane) © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 3
OAM Layering OAM Layers • OAM is layered… • Service Layer OAM • Network Layer OAM • Transport Layer OAM • . . . and hierarchical • For example, service layer for Operator A is transport layer for the service provider Service Provider Customer Network MPLS Ethernet Operator A Network Operator B Network Customer Location 1 Customer Location 2 Service OAM • Each layer supports its own OAM mechanisms • Operator A has an MPLS network and uses MPLS OAM tools • Operator B has an Ethernet network and uses Ethernet OAM tools © 2011 Brocade Communications Systems, Inc. Company Proprietary Information MPLS OAM (Operator A) Link OAM May 2011 Ethernet OAM (Operator B) Link OAM 4
OAM Tools Each layer has its own best-suited OAM tools VPN VRF Ping and Traceroute (Layer 3 VPN) 802. 1 ag CFM for VPLS/VLL Y. 1731 PM for VPLS/VLL (Layer 2 VPN) IP Ping and Traceroute BFD for OSPF and IS-IS MPLS LSP Ping and Traceroute BFD for RSVP-TE LSPs Layer 2 Port Loop Detection Layer 2 Trace UDLD Single-link LACP Keep-alive 802. 1 ag CFM/ Y. 1731 PM 802. 3 ah EFM OAM Business Problem Brocade Solution • Fault detection, verification, and isolation at every level • Standards-based, end-to-end OAM • Proactive detection of service degradation • Comprehensive/scalable MPLS, IP, and Ethernet OAM tools • Performance Monitoring (PM) and SLA verification © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 5
Layer 2 OAM + Layer 2 VPN CFM/PM: 802. 1 ag CFM, Y. 1731 PM © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 6
Layer 2 OAM + Layer 2 VPN CFM/PM: 802. 1 ag CFM, Y. 1731 PM VPN VRF Ping and Traceroute (Layer 3 VPN) 802. 1 ag CFM for VPLS/VLL Y. 1731 PM for VPLS/VLL (Layer 2 VPN) IP Ping and Traceroute BFD for OSPF and IS-IS MPLS LSP Ping and Traceroute BFD for RSVP-TE LSPs Layer 2 Port Loop Detection Layer 2 Trace © 2011 Brocade Communications Systems, Inc. Company Proprietary Information UDLD May 2011 Single-link LACP Keep-alive 802. 1 ag CFM/ Y. 1731 PM 802. 3 ah EFM OAM 7
IEEE 802. 1 ag CFM Connectivity Fault Management (CFM) • Facilitates • Path discovery • Fault detection • Fault verification and isolation • Fault notification • Fault recovery • Supports • Continuity Check Messages (CCMs) • Link. Trace • Loopback messages Customer Network Service Provider Operator A Network Operator B Network Customer location 1 Customer location 2 Customer CFM Service Provider CFM MEP MIP Operator A CFM Operator B CFM Brocade Implementation • Support for minimum CCM timers (3. 3 ms) using hardware offload • 3. 3 ms, 100 ms, 1 min, 10 min • Support for MIPs and up/down MEPs • Support for all eight MD levels (0 -7) • Support for the following types of endpoints/services • VLANs and VPLS/VLL endpoints © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 8
IEEE 802. 1 ag CFM Terminology Service Provider • MD (Maintenance Domain) • The part of a network for which faults in Layer 2 connectivity can be managed Customer Network • MEP (Maintenance End Point) • A Maintenance Point (MP) at the edge of a domain that actively sources CFM messages • Two types: up (inward*) MEP or down (outward) MEP Down MEP • MIP (Maintenance Intermediate Point) • A maintenance point internal to a domain that only responds when triggered by certain CFM messages • MA (Maintenance Association) • A set of MEPs established to verify the integrity of a single service instance (a VLAN or a VPLS) Operator A Network Operator B Network Customer location 1 Customer Network Customer location 2 Customer MA ME UP MEP MD level 5 (7, 6, or 5) Service Provider MA ME Operator A MA MEP ME MIP MD level 3 (4 or 3) Operator B MA ME MD level 1 (2, 1, or 0) • ME (Maintenance Entity) • A point-to-point relationship between two MEPs within a single MA • MD Level • An integer from 0 to 7 in a field in a CFM PDU that is used, along with the VLAN ID, to identify which MIPs/MEPs would be interested in the contents of a CFM PDU (*): “inward” in respect to the device © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 9
IEEE 802. 1 ag CFM Connectivity Check, Link. Trace, and Loopback Messages • Continuity Check Message (CCM) MEP • A periodic hello message multicast by an MEP within the maintenance domain Periodic CCM (multicast) Periodic CCM MEP • Link. Trace Message (LTM) • A multicast message used by a source MEP to trace the path to other MEPs and MIPs in the same domain MEP LTM (multicast) • All reachable MIPs and MEPs respond back with a Link Trace Unicast Reply (LTR) LTR (Unicast) MEP MIP LTR (Unicast) • The originating MEP can then determine the MAC addresses of all MIPs and MEPs belonging to the same Maintenance Domain • Loopback Message (LBM) • Used to verify the connectivity between a MEP and a peer MEP or MIP • A loopback message is initiated by a MEP with a destination MAC address set to the desired destination MEP or MIP (Unicast) MEP LBM (Unicast) LBR MEP • The receiving MIP or MEP responds to the Loopback message with a Loopback Reply (LBR) (Unicast) • A loopback message helps a MEP identify the precise location of a fault along a given path © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 10
Hierarchical Fault Detection Example: fault in Operator B network (an MPLS Network) • Customer detects fault using Continuity Check and locates fault using Link Trace • Provider A detects fault using Continuity Check and locates fault using Link Trace • Provider B detects fault using Continuity Check, but isolates fault using MPLS OAM (see MPLS OAM section) • A service provider (not shown) would detect this fault in a similar way using Continuity Check and Link Trace from CPEs (Customer Premise Equipment) 1: Customer Continuity Check detects end-to-end fault 2: Customer Link Traces isolate fault past customer MIPs 3: Provider A’s Continuity Check detects end-to-end fault MIPs and MEPs at VPLS/VLL endpoints 4: Provider A Link Traces isolate fault inside Provider B’s network 5: Provider B’s Continuity Check detects service fault MPLS PE (VPLS/VLL) PE P MEP MIP Fault Customer Network (Site 1) Operator A Operator B (Location A 1) Fault Localize d Information © 2011 Brocade Communications Systems, Inc. Company Proprietary Operator A (Location A 2) May 2011 Customer Network (Site 2) 11
IEEE 802. 1 ag Configuration Example To verify end-to-end connectivity between CE 1 and CE 2 MPLS 7 1/1 CE 1 Configure a down MEP on CE 1(config)#cfm-enable CE 1(config-cfm)#domain-name CUST_1 level 7 CE 1(config-cfm-md-CUST_1)#ma-name ma_5 vlan-id 30 priority 3 CE 1(config-cfm-md-CUST_1 -mama_5)#ccm-interval 10 -second CE 1(config-cfm-md-CUST_1 -mama_5)#mep 1 down vlan 30 port ethe 1/1 CE 1(config-cfm-md-CUST_1 -mama_5)#remote-mep 2 to 2 VLL 7 1/1 7 2/1 PE 1 Create a VLL instance (PE 1) PE 2 7 2/1 CE 2 Create a VLL instance (PE 2) PE 1(config)#router mpls PE 1(config-mpls)vll pe 1 -to-pe 2 30 PE 1(config-mpls-vll)vll-peer 1. 1. 1. 2 PE 1(config-mpls-vll)untagged ethe 1/1 PE 1(config-mpls-vll)vlan 30 PE 1(config-mpls-vll-vlan)tagged ethe 1/1 PE 2(config)#router mpls PE 2(config-mpls)vll pe 2 -to-pe 1 30 PE 2(config-mpls-vll)vpls-peer 1. 1 PE 2(config-mpls-vll)untagged ethe 2/1 PE 2(config-mpls-vll)vlan 30 PE 2(config-mpls-vll-vlan)tagged ethe 2/1 Configure CFM on PE 2 PE 1(config)#cfm-enable PE 1(config-cfm)#domain-name CUST_1 level 7 PE 1(config-cfm-md-CUST_1)#ma-name ma_5 vll-id 30 priority 3 PE 1(config-cfm-md-CUST_1 -ma-ma_5)#ccm -interval 10 -second In the above configuration, a MIP is created by default on the VLL port. PE 2(config)#cfm-enable PE 2(config-cfm)#domain-name CUST_1 level 7 PE 2(config-cfm-md-CUST_1)#ma-name ma_5 vll-id 30 priority 3 PE 2(config-cfm-md-CUST_1 -mama_5)#ccm-interval 10 -second In the above configuration, a MIP is created by default on the VLL-endpoint. Configure a down MEP on CE 2(config)#cfm-enable CE 2(config-cfm)#domain-name CUST_1 level 7 CE 2(config-cfm-md-CUST_1)#ma-name ma_5 vlan-id 30 priority 3 CE 2(config-cfm-md-CUST_1 -mama_5)#ccm-interval 10 -second CE 1(config-cfm-md-CUST_1 -mama_5)#mep 2 down vlan 30 port ethe 2/1 CE 1(config-cfm-md-CUST_1 -mama_5)#remote-mep 1 to 1 LSP ping and LSP traceroute tools would be used inside the MPLS network to detect and diagnose LSP failures © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 12
ITU-T Y. 1731 Performance Management • Standards-based performance management for Ethernet networks • Interoperates in a multivendor environment • Supports high-precision, on-demand measurement of round-trip SLA parameters • Frame Delay (FD) • Frame Delay Variation (FDV) • Measurements done between MEPs © 2011 Brocade Communications Systems, Inc. Company Proprietary Information Brocade MLX MEP ETH-DM Frame Delay Variation MEP: Management Enforcement Point ETH-DM: Ethernet Delay Measurement Benefits • SLA monitoring and verification Applicability • Aggregation, metro, and core networks • Delay-sensitive applications, such as voice • Differentiated services with SLA guarantees Brocade differentiation • Hardware-based time-stamping mechanism • Measurements with microsecond granularity • Y. 1731 PM for VPLS/VLL May 2011 13
ITU-T Y. 1731 Performance Management Example Net. Iron# cfm delay_measurement domain md 2 ma ma 2 src-mep 3 target-mep 2 Y 1731: Sending 10 delay_measurement to 0012. f 2 f 7. 3931, timeout 1000 msec Type Control-c to abort Reply from 0012. f 2 f 7. 3931: time= 32. 131 us Reply from 0012. f 2 f 7. 3931: time= 31. 637 us Brocade MLX Reply from 0012. f 2 f 7. 3931: time= 32. 566 us Brocade MLX Reply from 0012. f 2 f 7. 3931: time= 34. 052 us MEP 2 Reply from 0012. f 2 f 7. 3931: time= 33. 376 us MEP 3 Reply from 0012. f 2 f 7. 3931: time= 31. 501 us ETH-DM Reply from 0012. f 2 f 7. 3931: time= 33. 016 us Reply from 0012. f 2 f 7. 3931: time= 32. 537 us Reply from 0012. f 2 f 7. 3931: time= 32. 492 us Reply from 0012. f 2 f 7. 3931: time= 32. 552 us sent = 10 number = 10 A total of 10 delay measurement replies received. Success rate is 100 percent (10/10) ================================== Round Trip Frame Delay Time : min = 31. 501 us avg = 32. 586 us max = 34. 052 us Round Trip Frame Delay Variation : min = 45 ns avg = 839 ns max = 1. 875 us ================================== © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 14
Link OAM IEEE 802. 3 ah Ethernet First Mile (EFM) OAM • Supports point-to-point (single) link OAM • Monitors and supports troubleshooting individual links • Standards-based for Ethernet networks • Interoperates in a multivendor environment • Supports • Fault detection and notification (alarms) • Discovery • Remote failure indication • Loopback testing © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 802. 3 ah OAM Net. Iron#show link-oam info detail ethernet 1/1 OAM information for Ethernet port: 1/1 link-oam mode: active link status: up oam status: up Local information multiplexer action: forward parse action: forward stable: satisfied state: up loopback state: disabled dying-gasp: false critical-event: false link-fault: false Remote information multiplexer action: forward parse action: forward stable: satisfied loopback support: disabled dying-gasp: false critical-event: false link-fault: false 15
Layer 2 OAM Summary Layer 2 Trace Layer 2 network Intended troubleshooting, Application detection of mis-configuration Supports Port Loop Detection Single-Link Keep-Alive UDLD Layer 2 network troubleshooting, Link Single-link detection of keep-alive mis-configuration Layer 2 topology discovery, Layer 2 loop detection 802. 1 ag CFM Service verification Y. 1731 PM 802. 3 ah EFM OAM Performance Customer (SLA) access verification Link Single-link keep-alive Layer 2 Connectivity Check, Link Trace, Loopback One-way delay and delay variation Single-link OAM: Fault Detection, Discovery, Loop-back, and so on Manual Auto, Manual (LB) Yes Generation Manual Automatic CC: auto LT, LB: manual Standard No No No Yes Remember: OAM is layered and hierarchical (service OAM for an operator is transport OAM for a service provider) © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 16
MPLS OAM © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 17
MPLS OAM VPN VRF Ping and Traceroute (Layer 3 VPN) IP Ping and Traceroute MPLS LSP Ping and Traceroute Layer 2 Trace Port Loop Detection © 2011 Brocade Communications Systems, Inc. Company Proprietary Information UDLD May 2011 802. 1 ag CFM for VPLS/VLL Y. 1731 PM for VPLS/VLL (Layer 2 VPN) BFD for OSPF and IS-IS BFD for RSVP-TE LSPs Single-link LACP Keep-alive 802. 1 ag CFM/ Y. 1731 PM 802. 3 ah EFM OAM 18
LSP Ping and LSP Traceroute MPLS OAM tools • LSP Ping and LSP Traceroute provide OAM functionality for MPLS networks based on RFC 4379. • LSP Ping and LSP Traceroute tools provide a mechanism to detect MPLS data plane failure. • MPLS echo requests follow the same data path that normal MPLS packets would traverse. • LSP Ping is used to detect data plane failure and to check the consistency between the data plane and the control plane. • LSP Traceroute is used to isolate the data plane failure to a particular router and to provide LSP path tracing. © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 19
LSP Ping MPLS Network PE LSP • The basic idea is to verify that packets that belong to a particular Forwarding Equivalence Class (FEC) actually end their MPLS path on a Label Switching Router (LSR) that is an egress for that FEC. (LER) • LDP LSP Ping and RSVP LSP Ping are supported. LDP LSP Ping P PE (LSR) (LER) Echo Request Echo Reply LSP Ping Net. Iron# ping mpls ldp 22. 22. 22 Send 5 80 -byte MPLS Echo Requests for LDP FEC 22. 22. 22/32, timeout 5000 msec Type Control-c to abort !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/1 ms. Syntax: ping mpls ldp <ip-address | ip-address/mask-length>. . . options RSVP LSP Ping Net. Iron# ping mpls rsvp lsp toxmr 2 frr-18 Send 5 92 -byte MPLS Echo Requests over RSVP LSP toxmr 2 frr-18, timeout 5000 msec Type Control-c to abort !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/5 ms. Syntax: ping mpls rsvp lsp <lsp-name> | session <tunnel-source-address> <tunnel-destinationaddress> <tunnel-id>. . . options © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 20
LSP Traceroute MPLS Network PE LSP • With LSP traceroute, an echo request packet is sent to the control plane of each transit LSR, which confirms that it is a transit LSR for this path. (LER) P PE (LSR) (LER) Echo Request • Transit LSRs return echo replies. • LDP LSP Ping and RSVP LSP Ping are supported. LDP LSP Traceroute Echo Replies LSP Traceroute Net. Iron# traceroute mpls ldp 22. 22. 22 Trace LDP LSP to 22. 22. 22/32, timeout 5000 msec, TTL 1 to 30 Type Control-c to abort 1 10 ms 22. 22. 22 return code 3(Egress) Syntax: traceroute mpls ldp < ip-address | ip-address/mask-length>. . . options RSVP LSP Traceroute Net. Iron # traceroute mpls rsvp lsp toxmr 2 frr-18 Trace RSVP LSP toxmr 2 frr-18, timeout 5000 msec, TTL 1 to 30 Type Control-c to abort 1 1 ms 22. 22. 22 return code 3(Egress) Syntax: traceroute mpls rsvp lsp <lsp-name> | session <tunnel-source-address> <tunneldestination-address> <tunnel-id>. . . options © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 21
MPLS OAM Summary LSP Ping LSP Traceroute BFD for RSVP-TE LSPs To detect data plane failure and to check the consistency between the data plane and the control plane To isolate the data plane failure to a particular router and to provide LSP path tracing Supports Connectivity verification Fast data plane failure Connectivity troubleshooting, detection (link may fault localization be up, but data path is down) Generation Manual Automatic Standard Yes Yes Intended Application © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 Fast data plane failure detection for RSVP LSPs 22
Observation ICMP Operates at Specification Published Ping Layer 3 CFM Layer 2 RFC 792 RFC 1208 (RFC 1983) 802. 1 ag Sept 1981 March 1991 (Aug 1996) July 1983 Dec 2007 26 years of work for going down one layer of OAM © 2010 Brocade Communications Systems, Inc. Company Proprietary Information September 2010 23
Thank You © 2011 Brocade Communications Systems, Inc. Company Proprietary Information 24
- Slides: 24