Networking for the Future of LargeScale Science An


















![2) Multi-Domain Virtual Circuits • ESnet OSCARS [6] project has as its goals: Traffic 2) Multi-Domain Virtual Circuits • ESnet OSCARS [6] project has as its goals: Traffic](https://slidetodoc.com/presentation_image_h2/2fa01af8ffbfd51a1ede6c288ee8139f/image-19.jpg)







- Slides: 26
Networking for the Future of Large-Scale Science: An ESnet Perspective Joint Techs July, 2007 William E. Johnston ESnet Department Head and Senior Scientist Energy Sciences Network Lawrence Berkeley National Laboratory wej@es. net, www. es. net This talk is available at www. es. net/ESnet 4 Networking for the Future of Science 1
DOE’s Office of Science: Enabling Large-Scale Science • The Office of Science (SC) is the single largest supporter of basic research in the physical sciences in the United States, … providing more than 40 percent of total funding … for the Nation’s research programs in high-energy physics, nuclear physics, and fusion energy sciences. (http: //www. science. doe. gov) – SC funds 25, 000 Ph. Ds and Post. Docs • A primary mission of SC’s National Labs is to build and operate very large scientific instruments - particle accelerators, synchrotron light sources, very large supercomputers - that generate massive amounts of data and involve very large, distributed collaborations • ESnet is an SC program whose primary mission is to enable the largescale science of the Office of Science (SC) that depends on: – – – Sharing of massive amounts of data Supporting thousands of collaborators world-wide Distributed data processing Distributed data management Distributed simulation, visualization, and computational steering Collaboration with the US and International Research and Education community 2
Distributed Science Example: Multidisciplinary Simulation Heat Moisture Momentum CO 2 CH 4 N 2 O VOCs Dust Biogeochemistry Carbon Assimilation Decomposition Mineralization Energy Water Aerodynamics Biogeophysics Microclimate Canopy Physiology Phenology Bud Break Intercepted Water Snow Soil Water Hydrology Leaf Senescence Watersheds Surface Water Subsurface Water Geomorphology Hydrologic Cycle Ecosystems Species Composition Ecosystem Structure Disturbance Fires Hurricanes Vegetation Ice Storms Dynamics Windthrows (Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002. ) Years-To-Centuries Gross Primary Production Plant Respiration Microbial Respiration Nutrient Availability Species Composition Ecosystem Structure Nutrient Availability Water Days-To-Weeks Evaporation Transpiration Snow Melt Infiltration Runoff closely coordinated and interdependent distributed systems that must have predictable intercommunication for effective functioning Chemistry CO 2, CH 4, N 2 O ozone, aerosols Climate Temperature, Precipitation, Radiation, Humidity, Wind Minutes-To-Hours A “complete” approach to climate modeling involves many interacting models and data that are provided by different groups at different locations 3
Distributed Science Example: Sloan Galaxy Cluster Analysis The science “application” The science process and results Gri. Phy. N generated DAG workflow Sloan Data closely coordinated and interdependent distributed systems that must have predictable A DAG representation of the workflow for 48 and intercommunication for 60 searches over 600 datasets (each node effective functioning Galaxy cluster size distribution represents a process on a machine) executed in 2402 seconds on 62 hosts. *From “Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey, ” J. Annis, Y. Zhao, J. Voeckler, M. Wilde, S. Kent and I. Foster. In SC 2002. Baltimore, MD. http: //www. sc 2002. org/paperpdfs/pap. pap 299. pdf 4
Large-Scale Science: High Energy Physics’ Large Hadron Collider (Accelerator) at CERN LHC Goal - Detect the Higgs Boson The Higgs boson is a hypothetical massive scalar elementary particle predicted to exist by the Standard Model of particle physics. It is the only Standard Model particle not yet observed, but plays a key role in explaining the origins of the mass of other elementary particles, in particular the difference between the massless photon and the very heavy W and Z bosons. Elementary particle masses, and the differences between electromagnetism (caused by the photon) and the weak force (caused by the W and Z bosons), are critical to many aspects of the structure of microscopic (and hence macroscopic) matter; thus, if it exists, the Higgs boson has an enormous effect on the world around us.
The Largest Facility: Large Hadron Collider at CERN LHC CMS detector 15 m X 22 m, 12, 500 tons, $700 M CMS is one of several major detectors (experiments). The other large detector is ATLAS. human (for scale) Two counter-rotating, 7 Te. V proton beams, 27 km circumference (8. 6 km diameter), collide in the middle of the detectors 6
Data Management Model: A refined view of the LHC Data Grid Hierarchy where operations of the Tier 2 centers and the U. S. Tier 1 center are integrated through network connections with typical speeds in the 10 Gbps range. [ICFA SCIC] closely coordinated and interdependen t distributed systems that must have predictable intercommuni cation for effective functioning
Accumulated data (Terabytes) received by CMS Data Centers (“tier 1” sites) and many analysis centers (“tier 2” sites) during the past four months (8 petabytes of data) [LHC/CMS] This sets the scale of the LHC distributed data analysis problem.
The LHC Data Management System has Several Characteristics that Result in Requirements for the Network and its Services • The systems are data intensive and high-performance, typically moving terabytes a day for months at a time • The system are high duty-cycle, operating most of the day for months at a time in order to meet the requirements for data movement • The systems are widely distributed – typically spread over continental or inter-continental distances • Such systems depend on network performance and availability, but these characteristics cannot be taken for granted, even in well run networks, when the multi-domain network path is considered • The applications must be able to get guarantees from the network that there is adequate bandwidth to accomplish the task at hand • The applications must be able to get information from the network that allows graceful failure and auto-recovery and adaptation to unexpected network conditions that are short of outright failure This slide drawn from [ICFA SCIC]
Enabling Large-Scale Science • These requirements are generally true for systems with widely distributed components to be reliable and consistent in performing the sustained, complex tasks of large-scale science ØNetworks must provide communication capability that is service-oriented: configurable, schedulable, predictable, reliable, and informative – and the network and its services must be scalable 10
The LHC is the First of Many Large-Scale Science Scenarios Science Drivers End 2 End Reliability Connectivity Today End 2 End Band width 5 years End 2 End Band width • DOE sites • US Universities • Industry 200+ Mbps 1 Gbps • DOE sites • US Universities • International • Other ASCR 10 Gbps Science Areas / Facilities Magnetic Fusion Energy NERSC and ACLF 99. 999% (Impossible without full redundancy) - 20 to 40 Gbps Traffic Characteristics • Bulk data • Remote control • Remote file system sharing supercomputers NLCF Nuclear Physics (RHIC) Spallation Neutron Source - - High (24 x 7 operation) • DOE sites • US Universities • Industry • International • DOE sites • US Universities • International Backbone Band width parity Backbone band width parity 12 Gbps 70 Gbps • DOE sites 640 Mbps Network Services • Guaranteed bandwidth • Guaranteed Qo. S • Deadline scheduling • Guaranteed bandwidth • Guaranteed Qo. S • Deadline Scheduling • PKI / Grid • Bulk data • Remote file system sharing • Bulk data • Guaranteed bandwidth • PKI / Grid 2 Gbps • Bulk data (See refs. [1], [2], [3], and [4]. )
The LHC is the First of Many Large-Scale Science Scenarios Science Drivers Science Areas / Facilities End 2 End Reliability Advanced Light Source - Bioinformatics - Chemistry / Combustion Climate Science High Energy Physics (LHC) - - 99. 95+% (Less than 4 hrs/year) Connectivity • DOE sites • US Universities • Industry • DOE sites • US Universities Today End 2 End Band width 5 years End 2 End Band width 1 TB/day 5 TB/day 300 Mbps 1. 5 Gbps 625 Mbps 250 Gbps 12. 5 Gbps in two years • DOE sites • US Universities • Industry - • DOE sites • US Universities • International - Traffic Characteristics • Bulk data • Guaranteed bandwidth • Remote control • PKI / Grid • Bulk data • Guaranteed bandwidth • Remote control • High-speed • Point-tomulticast multipoint 10 s of Gigabits per second • Bulk data • Guaranteed bandwidth • PKI / Grid 5 PB per year 5 Gbps • • Bulk data Remote control Immediate Requirements and Drivers • US Tier 1 (FNAL, BNL) 10 Gbps 60 to 80 Gbps • Bulk data • US Tier 2 (Universities) (30 -40 Gbps • Coupled data per US Tier 1) analysis • International (Europe, processes Canada) Network Services • Guaranteed bandwidth • PKI / Grid • Guaranteed bandwidth • Traffic isolation • PKI / Grid
Terabytes / month Large-Scale Science is Beginning to Dominate all Traffic ESnet total traffic passed 2 Petabytes/mo about mid-April, 2007 top 100 sites to site workflows site to site workflow data not available ESnet Monthly Accepted Traffic, January, 2000 – June, 2007 • ESnet is currently transporting more than 1 petabyte (1000 terabytes) per month • More than 50% of the traffic is now generated by the top 100 sites large-scale science dominates all ESnet traffic 13
Large-Scale Science is Generating New Traffic Patterns total traffic, TBy Jan. , 2005 June, 2006 2 TB/month July, 2005 • While the total traffic is increasing 2 TB/month exponentially – Peak flow – that is system-to-system – bandwidth is decreasing Jan. , 2006 2 TB/month – The number of large flows is increasing
Large-Scale Science is Generating New Traffic Patterns Question: Why is peak flow bandwidth decreasing while total traffic is increasing? plateaus indicate the emergence of parallel transfer systems (a lot of systems transferring the same amount of data at the same time) Answer: Most large data transfers are now done by parallel / Grid data movers • In June, 2006 72% of the hosts generating the top 1000 flows were involved in parallel data movers (Grid applications) • This is the most significant traffic pattern change in the history of ESnet • This has implications for the network architecture that favor path multiplicity and route diversity 15
What Networks Need to Do • The above examples currently only work in carefully controlled environments with the assistance of computing and networking experts • For this essential approach to be successful in the long-term it must be routinely accessible to discipline scientists - without the continuous attention of computing and networking experts • In order to – facilitate operation of multi-domain distributed systems – accommodate the projected growth in the use of the network – facilitate the changes in the types of traffic the architecture and services of the network must change • The general requirements for the new architecture are that it provide: 1) Support the high bandwidth data flows of large-scale science including scalable, reliable, and very high-speed network connectivity to end sites 2) Dynamically provision virtual circuits with guaranteed quality of service (e. g. for dedicated bandwidth and for traffic isolation) 3) provide users and applications with meaningful monitoring end-to-end (across multiple domains) The next several slides present the ESnet response to these requirements 16
1) A Hybrid Network is Tailored to Circuit-Oriented Services ESnet 4 IP + SDN, 2011 Configuration - most of the bandwidth is in the Layer 2 Science Data Network (SDN) Seattle (>1 ) ) (8 (28) Portland Boise 5 (32) Salt Lake City 4 Denver San Diego 4 (22) (0) Albuq. (1 5 2) Tulsa lis El Paso ESnet IP switch/router hubs (4) Atlanta (2) Jacksonville 4 ESnet IP switch only hubs (6) (5) Houston Baton Rouge ESnet SDN switch hubs Layer 1 optical nodes at eventual ESnet Points of Presence Layer 1 optical nodes not currently in ESnet plans Lab site Wash. DC OC 48 4 (17) Philadelphia 5 (26) Raleigh 5 (20) (19) NYC (25) (30) (3) 3 3 . tts 4 Nashville (1) 5 (10) Pi o ap (21) n a di In 3 5 4 4 5 (16 ) 4 (24) KC (15) (23) LA (13) (11) Boston (9) 5 Clev. (27 ) Sunnyvale Chicago 4) (7) 4 (1 5 (29) (20) ESnet IP core (1 ) ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link LHC related link MAN link International IP Connections Internet 2 circuit number 17
High Bandwidth all the Way to the End Sites – major ESnet sites are now effectively directly on the ESnet “core” network Long Island MAN West Chicago MAN 600 W. Chicago USLHCNet 32 Ao. A, NYC Starlight (>1 ) BNL Boise USLHCNet Chicago 5 FNAL (32) 4 (23) LA (24) San Diego Denver Salt Lake City 5 4 SLAC (19) 4 LBNL Albuq. Tulsa is ol 3 Lab site Wash. DC OC 48 (4) Atlanta ORNL Jacksonville (6) Baton Houston. Nashville Rouge Wash. , DC Layer 1 optical nodes at eventual ESnet Points of Presence Layer 1 optical nodes not currently in ESnet plans Philadelphia 5 (26) Raleigh 5 (5) 56 Marietta (SOX) (25) (2) MAN ESnet IP switch only hubs. LLNL NYC (30) 4 (17) . tts 4 (20) 4 Atlanta 5 (10) Pi ap (21) an (3) 3 3 NERSC ESnet SDN switch hubs SNLL 2 (1 Nashville (1) El Paso ESnet IP switch/router hubs di In (22) (0) 5 (16 ) JGI 4 (13) KC (15) San Francisco Bay Area MAN 4 ANL (11) Boston (9) 5 Clev. (27 ) (7) (1 4) Sunnyvale ) (8 e. g. the Seattle bandwidth into and out (28) of FNAL Portland is equal to, or greater, than 5 the (29) ESnet core bandwidth 4 180 Peachtree Houston (20) Wash. , DC ESnet IP core (1 ) MATP ESnet Science Data Network core ESnet SDN core, NLR links (existing) JLab supplied link ELITE LHC related link MAN link ODU International IP Connections Internet 2 circuit number
2) Multi-Domain Virtual Circuits • ESnet OSCARS [6] project has as its goals: Traffic isolation and traffic engineering – Provides for high-performance, non-standard transport mechanisms that cannot co-exist with commodity TCP-based transport – Enables the engineering of explicit paths to meet specific requirements • • e. g. bypass congested links, using lower bandwidth, lower latency paths Guaranteed bandwidth (Quality of Service (Qo. S)) – User specified bandwidth – Addresses deadline scheduling • • Where fixed amounts of data have to reach sites on a fixed schedule, so that the processing does not fall far enough behind that it could never catch up – very important for experiment data analysis Reduces cost of handling high bandwidth data flows – Highly capable routers are not necessary when every packet goes to the same place – Use lower cost (factor of 5 x) switches to relatively route the packets Secure connections – The circuits are “secure” to the edges of the network (the site boundary) because they are managed by the control plane of the network which is isolated from the general traffic End-to-end (cross-domain) connections between Labs and collaborating institutions 19
OSCARS User request via WBUI User Human User feedback User Application User app request via AAAS • Web-Based User Interface Authentication, Authorization, And Auditing Subsystem Reservation Manager Path Setup Subsystem Instructions to routers and switches to setup/teardown LSPs Bandwidth Scheduler Subsystem To ensure compatibility, the design and implementation is done in collaboration with the other major science R&E networks and end sites – Internet 2: Bandwidth Reservation for User Work (BRUW) • Development of common code base – GEANT: Bandwidth on Demand (GN 2 -JRA 3), Performance and Allocated Capacity for End-users (SA 3 -PACE) and Advance Multi-domain Provisioning System (AMPS) extends to NRENs – BNL: Tera. Paths - A Qo. S Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research – GA: Network Quality of Service for Magnetic Fusion Research – SLAC: Internet End-to-end Performance Monitoring (IEPM) – USN: Experimental Ultra-Scale Network Testbed for Large-Scale Science – DRAGON/HOPI: Optical testbed 20
3) Monitoring Applications of the Types that Move Us Toward Service-Oriented Communications Services • E 2 Emon provides end-to-end path status in a service-oriented, easily interpreted way – a perf. SONAR application used to monitor the LHC paths end-to-end across many domains – uses perf. SONAR protocols to retrieve current circuit status every minute or so from MAs and MPs in all the different domains supporting the circuits – is itself a service that produces Web based, real-time displays of the overall state of the network, and it generates alarms when one of the MP or MA’s reports link problems.
E 2 Emon: Status of E 2 E link CERN-LHCOPN-FNAL-001 E 2 Emon generated view of the data for one OPN link [E 2 EMON] 22
Path Performance Monitoring • Path performance monitoring needs to provide users/applications with the end-to-end, multi-domain traffic and bandwidth availability – should also provide real-time performance such as path utilization and/or packet drop • Multiple path performance monitoring tools are in development – One example – Traceroute Visualizer [Tr. Viz] – has been deployed at about 10 R&E networks in the US and Europe that have at least some of the required perf. SONAR MA services to support the tool 23
Traceroute Visualizer • Forward direction bandwidth utilization on application path from LBNL to INFN-Frascati (Italy) – traffic shown as bars on those network device interfaces that have an associated MP services (the first 4 graphs are normalized to 2000 Mb/s, the last to 500 Mb/s) 1 ir 1000 gw (131. 243. 2. 1) 2 er 1 kgw 3 lbl 2 -ge-lbnl. es. net link capacity is also provided 10 esnet. rt 1. nyc. us. geant 2. net (NO DATA) 11 so-7 -0 -0. rt 1. ams. nl. geant 2. net (NO DATA) 12 so-6 -2 -0. rt 1. fra. de. geant 2. net (NO DATA) 13 so-6 -2 -0. rt 1. gen. ch. geant 2. net (NO DATA) 14 so-2 -0 -0. rt 1. mil. it. geant 2. net (NO DATA) 15 garr-gw. rt 1. mil. it. geant 2. net (NO DATA) 16 rt 1 -mi 1 -rt-mi 2. garr. net 4 slacmr 1 -sdn-lblmr 1. es. net (GRAPH OMITTED) 5 snv 2 mr 1 -slacmr 1. es. net (GRAPH OMITTED) 6 snv 2 sdn 1 -snv 2 mr 1. es. net 17 rt-mi 2 -rt-rm 2. garr. net (GRAPH OMITTED) 18 rt-rm 2 -rc-fra. garr. net (GRAPH OMITTED) 19 rc-fra-ru-lnf. fra. garr. net (GRAPH OMITTED) 7 chislsdn 1 -oc 192 -snv 2 sdn 1. es. net (GRAPH OMITTED) 8 chiccr 1 -chislsdn 1. es. net 20 21 www 6. lnf. infn. it (193. 206. 84. 223) 189. 908 ms 189. 596 ms 189. 684 ms 9 aofacr 1 -chicsdn 1. es. net (GRAPH OMITTED) 24
Conclusions (from the ESnet Point of View) • The usage of, and demands on, ESnet (and similar R&E networks) are expanding significantly as large-scale science becomes increasingly dependent on high-performance networking • The motivation for the next generation of ESnet is derived from observations of the current traffic trends and case studies of major science applications • The case studies of the science uses of the network lead to an understanding of the new uses of the network that will be required • These new uses require that the network provide new capabilities and migrate toward network communication as a service-oriented capability. 25
References 1. High Performance Network Planning Workshop, August 2002 – 2. 3. http: //www. doecollaboratory. org/meetings/hpnpw Science Case Studies Update, 2006 (contact eli@es. net) DOE Science Networking Roadmap Meeting, June 2003 – 4. http: //www. es. net/hypertext/welcome/pr/Roadmap/index. html Science Case for Large Scale Simulation, June 2003 – 5. http: //www. pnl. gov/scales/ Planning Workshops-Office of Science Data-Management Strategy, March & May 2004 – 6. http: //www-conf. slac. stanford. edu/dmw 2004 For more information contact Chin Guok (chin@es. net). Also see - http: //www. es. net/oscars [LHC/CMS] http: //cmsdoc. cern. ch/cms/aprom/phedex/prod/Activity: : Rate. Plots? view=global [ICFA SCIC] “Networking for High Energy Physics. ” International Committee for Future Accelerators (ICFA), Standing Committee on Inter-Regional Connectivity (SCIC), Professor Harvey Newman, Caltech, Chairperson. - http: //monalisa. caltech. edu: 8080/Slides/ICFASCIC 2007/ [E 2 EMON] Geant 2 E 2 E Monitoring System –developed and operated by JRA 4/WI 3, with implementation done at DFN http: //cnmdev. lrz-muenchen. de/e 2 e/html/G 2_E 2 E_index. html http: //cnmdev. lrz-muenchen. de/e 2 e/lhc/G 2_E 2 E_index. html [Tr. Viz] ESnet Perf. SONAR Traceroute Visualizer https: //performance. es. net/cgi-bin/level 0/perfsonar-trace. cgi 26