Design and Implementation of TWAREN Hybrid Network Management
Design and Implementation of TWAREN Hybrid Network Management System National Center for High-Performance Computing Speaker: Ming-Chang Liang & Li-Chi Ku 1
Outline § Introduction § Motivation § Issues § Design § Implementation § Future works 2
INTRODUCTION 3
About TWAREN § TWAREN (Tai. Wan Advanced Research & Education Network) network construction was completed at the end of 2003 and started its operation and service in the beginning of 2004. § In its initial phase, IP routing was the main service provided. § The network management programs coming along with the purchase of network equipments, including CIC, Webtop, CW 2 K, HP Openview, HP NNM and other solutions. 4
Initial phase of TWAREN MOECC NTU NCCU 10 GE C 6509 STM-64/OC-192 STM-16/OC- 48 GE ASCC C 7609 NDHU Taipei C 6509 GSR NCU C 6509 EBT 10 GE CCU C 6509 NHLTC TWAREN C 6509 GSR NCU Tainan Hsinchu NCTU GSR NTHU C 6509 NCHU Taichung NTTU C 6509 NYSU 5
Initial phase of NMS Web. Top Remedy Help Desk CLI Notification Gateway API SMTP HTTP FTP DNS ISM CLI Cisco Info Center Probe CW 2 K (DFM) PING Polling NAM 12416 Trap NNM CTM Trap 7609 Trap PING Trap Polling 3750 2522 2600 PING Polling 15454 15600 6
Phase 2 of TWAREN § § § TWAREN was adapted for more protection methods and better availability at the end of 2006, called TWAREN phase 2. Tens of optical switches and hundreds of lightpaths were then served as the foundation of the layer 2 VLAN services and the layer 3 IP routing services. In 2008, tens of VPLS switches were further incorporated to provide additional Multi-point VPLS VPN service. The layer 1 lightpaths can be protected by SNCP, layer 2 VLAN by spanning tree recalculation and layer 2 VPLS by fast reroute technology. All these improvements transform TWAREN phase 2 into a true hybrid network capable of providing multiple layers of services and high availability. 7
Architecture of TWAREN phase 2 6509 7609 6509 15454 6509 NDHU 3750 7609 15600 15454 12816 7609 12816 15454 MOEcc NCNU 7609 C NHLTC Taipei 7609 12816 6509 NCTU NIU 7609 15454 NCU NCCU ASCC NTU 15454 15600 15454 7609 C 12816 NCHC Hsinchu Taichung 7609 C 3750 6509 15454 7609 Tainan NCHC 12816 6509 15454 NCHC NCHU 7609 C 7609 6509 12816 15454 NTTU 12816 NTHU 15454 3750 6509 15600 7609 15454 6509 7609 NSYSU 15454 6509 7609 NCKU 15454 6509 7609 CCU STM 64 STM 16 10 GE GE 8
MOTIVATION 9
Why need new NMS? § The architecture of TWAREN phase 2 became more and more complicated. § Since TWAREN phase 2 has more protection methods, a single point of hardware or circuit failure will not interrupt the service level provided to the end users. § The initial phase of NMS was no longer competent for the hybrid network anymore because it is hard to determine and predict the correlation between failures and affected services. 10
Requirements for new NMS § Automatically determine the correlation between failures, affected services, affected customs and severity level on this highly safeguard network. § Provide single integrated visual user interface. § Use integrated database, logs, message flows and exchange protocols. § After several surveys, we decided to develop a new NMS which be suitable for monitoring all services provided by TWAREN phase 2. 11
ISSUES 12
Uncertainty of SNMP implementation § There are some different implementations of the SNMP TRAP/MIB among equipments of same brand. § The SNMP OIDs or the return values may vary between OS upgrade on the same equipment and are usually hard to reveal beforehand. § Therefore, the system must be designed in a way such that these changes can be accommodated with minimal modifications. 13
The lack of skillful programmers § Our programmers are the same guys with the members of operating team. § We are not professional programmers and have not accordant programming language. § The system must be partially available and operational during the early phase of its development such that it can evolve along with the real needs. § So, an unified standard of communication between different modules is necessary 14
Huge historical data and computing § For minimizing the false positive and false negative rate, baseline thresholds would have much better quality when they are dynamically generated from historical data. § Therefore, we need to store sufficiently large historical data sets and to have very high efficiency to retrieve the data back while calculating those thresholds. 15
Automatically determine affected services and customs § TWAREN phase 2 inherently has the ability to guard against a single point of hardware or circuit failure, so the failure is less likely to affect the actual service provisioning. § An intelligent management system which is able to determine the scope of failure affected service will reduce the management cost. 16
DESIGN 17
1 st Stage System Architecture Monitor Objs Traps Control API GUI & Ticket System Fault Detection Data Collectors Fault Location MIBs Syslogs Net flows Telnet/SSH Current Status DB Threshold DB Long Term DB Case/Action DB TL 1 Mirror Interactive Auto Action Threshold Analyzer Report System Passive 18
Relationship of Data Tables Basic Data Tables Relationship Tables Component Circuit People VLAN Services Location VPLS Services Unit ONS Light Path Vendor ONS Cross Connection …. , etc 19
Basic Data Tables Component Data Table Component_ID Parent_C_ID Name 1 0 TN 7609 P 12 1 2 Vendor Data Table ID Name Slot_1 1 CHT 0 TP 15454 2 APBT 16 2 Slot_3 3 Ring. Line 135 12 Port_9 People Data Table ID Name Phone Address Service_Time Service_Week. Day 1 John 0939123123 xxxxxxx 8 -17 1, 3, 5 2 Mary 0958123123 xxxxxxx ALL Location Data Table Unit Data Table ID Name Address ID Name 1 MOEcc xxxxx 1 NCKU 2 NTU xxxxx 18 THU 20
Relationship Data Tables Circuit Data Table ID Name Vendor Identify From_CID To_CID Bandwidth 1 Taipei_Tainan_STM 64 1 8 D 543267 13 35 STM 64 2 NCHU_NCNU_10 GE 2 ST 16987 23 67 10 GE ONS Topology Link Table ONS Light Path Table Node. A Node. B Port. A Port. B LP Port. From Port. To SNCP_LP CRS_Trace Size 12 45 1467 2346 2 2312 2345 0 359, 556, 522, 475 4 16 32 2312 3421 98 3434 4455 99 482, 541, 335 16 99 3434 4455 98 482, 469, 541, 335 16 ONS Cross Connection Table CRS Port. A Port. B SNCP_CRS Channel. A Channel. B Size 482 1744 1756 0 5 13 4 21 3343 24 17 33 16 24 3546 4534 21 1 17 16 21
IMPLEMENTATION 22
Current monitor objects § § § Trap monitor o Used interfaces, BGP, etc. Environment of equipment room o Temperature (auto threshold), Voltage Statuses of equipments o Temperature , CPU, RAM, FANs, Power-Supply BGP peering with other networks o Statuses, Number of exchanged routes (auto threshold), Utilization analysis Performance monitor o End to End RTT (auto threshold), End to End Packet Lost Rate (auto threshold), End to End Availability Throughput o Backbone (auto threshold), Designate interfaces Top N o Bytes, Flows, Packets Routes monitor o The routes of customs (exact comparison) VPLS VPN o Throughput of CE side, MACs of VPN Optical Network o Current topology of lightpaths VLAN o Current topology of VLAN 23
Future works § Combine all developed monitor objects with single integrated visual user interface. § Enhance the monitoring of optical, VPLS and VLAN networks. § Automatically determine the fault location, root cause and affected scope. § Minimize the false positive and false negative rate. 24
- Slides: 24