Clusterix National IPv 6 Computing Facility in Poland

  • Slides: 24
Download presentation
Clusterix: National IPv 6 Computing Facility in Poland Artur Binczewski artur@man. poznan. pl Radosław

Clusterix: National IPv 6 Computing Facility in Poland Artur Binczewski artur@man. poznan. pl Radosław Krzywania sfrog@man. poznan. pl Maciej Stroiński stroins@man. poznan. pl Jan Węglarz weglarz@man. poznan. pl

Agenda • Clusterix Project • PIONIER Network • Clusterix Network Architecture • Network as

Agenda • Clusterix Project • PIONIER Network • Clusterix Network Architecture • Network as a resource • Dynamic Computing Resources

Clusterix Project

Clusterix Project

Clusterix Project • Initiated in the year 2003 by 12 Polish computing centers •

Clusterix Project • Initiated in the year 2003 by 12 Polish computing centers • Objectives: – To build productive and efficient GRID environment – To provide enhanced security to created GRID infrastructure – To introduce IPv 6 based communication to GRID applications – To create scalable computing infrastructure with dynamic resourced attachment

Clusterix Project • 64 bits Intel computing nodes • Over 800 processors with computing

Clusterix Project • 64 bits Intel computing nodes • Over 800 processors with computing power at 4. 4 TFLOPS • Linux operating system (Debian distribution) • IPv 6 as primary protocol (with IPv 4 coexistence) • Communication based on dedicated channels within PIONIER network

PIONIER network

PIONIER network

PIONIER network • Polish Optical Internet – PIONIER – Modern fiber based network –

PIONIER network • Polish Optical Internet – PIONIER – Modern fiber based network – Connects 21 academic and research centres – Over 5500 km of fibers is planned (over 3500 km exist by now) – Build with DWDM infrastructure – 10 Gbps capacity is available by now

PIONIER network GDAŃSK KOSZALIN OLSZTYN SZCZECIN BASNET 34 Mb/s PIONIER’S FIBERS BYDGOSZCZ BIAŁYSTOK TORUŃ

PIONIER network GDAŃSK KOSZALIN OLSZTYN SZCZECIN BASNET 34 Mb/s PIONIER’S FIBERS BYDGOSZCZ BIAŁYSTOK TORUŃ DFN 10 Gb/s 2 x 10 Gb/s (2 lambdas) POZNAŃ 10 Gb/s (1 lambda) GÉANT 10+10 Gb/s TELIA 2 x 2, 5 Gb/s WARSZAWA ZIELONA GÓRA CBDF 10 GE 1 Gb/s ŁÓDŹ GTS 1, 2 Gb/s RADOM WROCŁAW CZĘSTOCHOWA KIELCE OPOLE PUŁAWY LUBLIN KATOWICE RZESZÓW KRAKÓW BIELSKO-BIAŁA CESNET, SANET 10 Gb/s Metropolitan Area Networks

Clusterix Network Architecture

Clusterix Network Architecture

Clusterix Network Architecture • • • Communication to all cluster is Local Cluster passed

Clusterix Network Architecture • • • Communication to all cluster is Local Cluster passed through router/firewall Switch routing based on IPv 6 protocol, with IPv 4 for back compatibility Access Node feature Application and Clusterix middleware adjusted to IPv 6 usage For security reason only outgoing connections to Internet are permitted Two 1 Gbps VLANs are used to Computing Nodes improve management of network traffic – – Communication VLAN is dedicated Communication to support nodes messages & NFS VLANs exchange NFS VLAN is dedicated to support file transfer Clusterix Storage Element PIONIER Core Switch 1 Gbps Backbone Traffic Internet Network Access Router Firewall

Network as a resource

Network as a resource

Network as a resource • Network management application – Objectives and features • Tracking

Network as a resource • Network management application – Objectives and features • Tracking and monitoring network status • Performing measurements • Discovering failures location • Providing network statistics for GRID services • Layer 3 Qo. S management • Automatic measurement session configuration • Failure resistance

Network as a resource – Measurements • Measurement architecture PIONIER Backbone Measurements SNMP Monitoring

Network as a resource – Measurements • Measurement architecture PIONIER Backbone Measurements SNMP Monitoring – Distributed 2 -level measurement agent mesh (backbone/cluster) Measurement Network Manager – Centralized control Reports manager (multiple redundant instances) – Switches are monitored via SNMP – Reports are stored by manager (forwarded to database) – IPv 6 protocol and addressing schema is used for measurement Computing Cluster Local Cluster Measurements

– GUI shows network status and configure manager – Backup managers improves failure recovery

– GUI shows network status and configure manager – Backup managers improves failure recovery (active manager switching) – External applications are allowed to retrieve various network statistics – Devices and agents management modules collect network data System Manager – Statistics are stored in external database (short time backup is stored in manager) System Resources • Manager architecture External Entities Network as a resource – Architecture Database Controller External Clients GUI External Interfaces Backup Manager Redundancy Controller System Logic Measurement Agents Manager Device Manager Backbone measurements Devices Local Cluster measurements

Network as a resource – Protocol • Active Measurement Protocol –All agent types uses

Network as a resource – Protocol • Active Measurement Protocol –All agent types uses the same communication protocol –First implementation was OWAMP based –One way measurements was abandoned, and round trip measurement approach is used –Future modifications was done due to non-fixed messages length and extra requirements –Protocol supports both IPv 6 and IPv 4 protocols –Measurements traffic pattern can be specified for more detailed network examination –Network metrics: • RTT • Duplicated packets • Jitter • Packet loss • Packets out of order

Network as a resource – Monitoring • Monitoring – Core switches are monitored via

Network as a resource – Monitoring • Monitoring – Core switches are monitored via SNMP protocol to track • Interfaces status • Maximum available capacity • Current link utilization – SNMP View is used to improve device's security

Network as a resource – Fail Safe Regular working Manager Backup Manager Synchronization Data

Network as a resource – Fail Safe Regular working Manager Backup Manager Synchronization Data Measurement Network • Only one active manager is allowed (selection algorithm is based on Bully algorithm) • Required data are exchanged between active and backup managers • Measurement agents register at active manager only

Network as a resource – Fail Safe Failure event Manager Failure • In case

Network as a resource – Fail Safe Failure event Manager Failure • In case of failure, the selection of new active manager is performed • Agents not register until new active manager is elected • Measurements are still performed, and results are temporarily stored on agents side • Newly elected manager recovers system state and accepts agents registrations • System is ready to serve information New Manager

Network as a resource – GUI • GUI – Provides view of network status

Network as a resource – GUI • GUI – Provides view of network status – Gives look at statistics – Simplifies network troubleshooting – Allows to configure measurement sessions – Useful for topology browsing

Dynamic Computing Resources

Dynamic Computing Resources

Dynamic Computing Resources – Motivation • External clusters can be easily attached to Clusterix

Dynamic Computing Resources – Motivation • External clusters can be easily attached to Clusterix infrastructure in order to: – Increase computing power with new clusters – Utilize external clusters during nights or non-active periods – Make Clusterix infrastructure scalable

Dynamic Computing Resources Architecture • Dynamic cluster attachment: – Requirements needs to be checked

Dynamic Computing Resources Architecture • Dynamic cluster attachment: – Requirements needs to be checked against new clusters Local Switch PIONIER Backbone Switch • Installed software • SSL certificates – Communication through router/firewall Internet – Network Management System will automatically discover new Regular resources Cluster – New cluster can serve computing power on regular basis Router Firewall Dynamic Resources

Summary • Fast computing center interconnection through PIONIER • IPv 6 protocol is introduced

Summary • Fast computing center interconnection through PIONIER • IPv 6 protocol is introduced to GRID environment • Failure resist network monitoring system • Network is used as a regular GRID resource • Dynamic architecture allows easy power upgrades

Thank you for your attention! Visit http: //www. clusterix. pcz. pl

Thank you for your attention! Visit http: //www. clusterix. pcz. pl