JASMIN Petascale storage and terabit networking for environmental
![JASMIN Petascale storage and terabit networking for environmental science Matt Pritchard Centre for Environmental JASMIN Petascale storage and terabit networking for environmental science Matt Pritchard Centre for Environmental](https://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-1.jpg)
JASMIN Petascale storage and terabit networking for environmental science Matt Pritchard Centre for Environmental Data Archival RAL Space Jonathan Churchill Scientific Computing Department STFC Rutherford Appleton Laboratory
![What is JAMSIN? • What is it? – 16 Petabytes high-performance disk – 4000 What is JAMSIN? • What is it? – 16 Petabytes high-performance disk – 4000](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-2.jpg)
What is JAMSIN? • What is it? – 16 Petabytes high-performance disk – 4000 computing cores • (HPC, Virtualisation) – High-performance network design – Private clouds for virtual organisations • For Whom? – – Entire NERC community Met Office European agencies Industry partners • How?
![Context BADC CEMS Academic IPCC-DDC Archive (NEODC) Curation UKSSDC Virtual Machines Group Work Spaces Context BADC CEMS Academic IPCC-DDC Archive (NEODC) Curation UKSSDC Virtual Machines Group Work Spaces](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-3.jpg)
Context BADC CEMS Academic IPCC-DDC Archive (NEODC) Curation UKSSDC Virtual Machines Group Work Spaces Batch processing cluster Cloud Analysis Environments JASMIN Infrastructure
![JASMIN: the missing piece Urgency to provide better environmental predictions • Need for higher-resolution JASMIN: the missing piece Urgency to provide better environmental predictions • Need for higher-resolution](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-4.jpg)
JASMIN: the missing piece Urgency to provide better environmental predictions • Need for higher-resolution models • HPC to perform the computation • Huge increase in observational capability/capacity But… • Massive storage requirement: observational data transfer, storage, processing • Massive raw data output from prediction models • Huge requirement to process raw model output into usable predictions (graphics/postprocessing) Hence JASMIN… • ARCHER supercomputer (EPSRC/NERC) JAMSIN (STFC/Stephen Kill)
![Data growth 30 -300 Pb ? CMIP 6 Climate models Light blue = total Data growth 30 -300 Pb ? CMIP 6 Climate models Light blue = total](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-5.jpg)
Data growth 30 -300 Pb ? CMIP 6 Climate models Light blue = total of all tape at STFC Green = Large Hadron Collider (LHC) Tier 1 data on tape Dark blue = data on disk in JASMIN Data growth on JASMIN has been limited by: Not enough disk (now fixed …for a while) Not enough local compute (now fixed …for a while) Not enough inbound bandwidth (now fixed …for a while) 30 -85 Pb (unique data) projection for JASMIN
![JASMIN jasmin-login 1 jasmin-xfer 1 SSH login gateway Data transfers firewall jasmin-sci 1 Key: JASMIN jasmin-login 1 jasmin-xfer 1 SSH login gateway Data transfers firewall jasmin-sci 1 Key:](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-6.jpg)
JASMIN jasmin-login 1 jasmin-xfer 1 SSH login gateway Data transfers firewall jasmin-sci 1 Key: lotus. jc. rl. ac. uk Batch processing cluster Science/analysis General-purpose resources Project-specific resources Data centre resources VM VM GWS GWS /group_workspaces/jasmin/ Data Centre Archive User view /badc /neodc VM
![Success stories (1) • QA 4 ECV – Re-processed MODIS Prior in 3 days Success stories (1) • QA 4 ECV – Re-processed MODIS Prior in 3 days](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-7.jpg)
Success stories (1) • QA 4 ECV – Re-processed MODIS Prior in 3 days on JASMINLOTUS, 81 x faster than original process on 8 -core blade – Throw hardware* at the problem *Right type of hardware
![Success stories (2) • • • Test feasible resolutions for global climate models 250 Success stories (2) • • • Test feasible resolutions for global climate models 250](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-8.jpg)
Success stories (2) • • • Test feasible resolutions for global climate models 250 Tb in 1 year generated on PRACE supercomputing facility in Germany (HERMIT). 400 Tb total. Network transfer to JASMIN Analysed by Met Office scientists as soon as available Deployment of VMs running custom scientific software, colocated with data Outputs migrated to long term archive (BADC) Image: P-L Vidale & R. Schiemann, NCAS Mizielinksi et al (Geoscientific Model Development, 2013) “High resolution global climate modelling; the UPSCALE project, a large simulation campaign”
![Coming soon Coming soon](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-9.jpg)
Coming soon
![Phase 2/3 expansion • 2013 NERC Big Data capital investment • Wider scope: support Phase 2/3 expansion • 2013 NERC Big Data capital investment • Wider scope: support](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-10.jpg)
Phase 2/3 expansion • 2013 NERC Big Data capital investment • Wider scope: support projects from new communities, e. g. • EOS Cloud • Environmental ‘omics. Cloud Bio. Linux platform • Geohazards • Batch compute of Sentinel-1 a SAR for large-scale, hi-res Earth surface deformation measurement Sentinel-1 a (ESA) JASMIN hard upgrade JASMIN soft upgrade Phase 2 by March 2014 Phase 3 by March 2015 +7 Petabytes disk +6 Petabytes tape +3000 compute cores network enhancement +o(2) Petabytes disk +o(800) compute cores network enhancement Virtualisation software Scientific analysis software Cloud management software Dataset construction Documentation
![5+6. 9+4. 4 = 16. 3 PB disk Phase 1 & 2 joined as 5+6. 9+4. 4 = 16. 3 PB disk Phase 1 & 2 joined as](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-11.jpg)
5+6. 9+4. 4 = 16. 3 PB disk Phase 1 & 2 joined as part of Phase 3 JASMIN Now
![](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-12.jpg)
![JASMIN Analysis Platform • Software stack for scientific analysis on JASMIN – Common packages JASMIN Analysis Platform • Software stack for scientific analysis on JASMIN – Common packages](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-13.jpg)
JASMIN Analysis Platform • Software stack for scientific analysis on JASMIN – Common packages for climate science, geospatial analysis – CIS developed as part of JASMIN – Deployed on JASMIN • Shared VMs • Sci VM template • LOTUS
![Community Intercomparison Suite (CIS) CIS = Component of JAP Time-series Global plots Overlay plots Community Intercomparison Suite (CIS) CIS = Component of JAP Time-series Global plots Overlay plots](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-14.jpg)
Community Intercomparison Suite (CIS) CIS = Component of JAP Time-series Global plots Overlay plots Line plots Scatter plots Curtain plots Histograms Dataset Format AERONET Text MODIS HDF CALIOP HDF Cloud. SAT HDF AMSRE HDF TRMM HDF CCI aerosol & cloud Net. CDF SEVIRI Net. CDF Flight campaign data RAF Models Net. CDF
![Vision for JASMIN 2 (Applying lessons from JASMIN 1) • Some key features JASMIN Vision for JASMIN 2 (Applying lessons from JASMIN 1) • Some key features JASMIN](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-15.jpg)
Vision for JASMIN 2 (Applying lessons from JASMIN 1) • Some key features JASMIN Cloud <= Different slices thru the infrastructure => Data Archive and compute Bare Metal Compute – Nodes are general purpose: boot as bare metal or hypervisors High performance global file system Virtualisation Internal Private Cloud Federation API Cloud burst as demand requires External Cloud Providers Isolated part of the network Support a spectrum of usage models – Use cloud tenancy model to make Virtual Organisations – Networking: made isolated network inside JASMIN to give users greater freedom: full Iaa. S, root access to hosts …
![JASMIN Cloud Architecture IPython Notebook VM could access cluster through Python API Cloud. Bio. JASMIN Cloud Architecture IPython Notebook VM could access cluster through Python API Cloud. Bio.](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-16.jpg)
JASMIN Cloud Architecture IPython Notebook VM could access cluster through Python API Cloud. Bio. Linux Desktop ssh via public IP JASMIN Cloud Management Interfaces JASMIN Internal Network External Network inside JASMIN Managed Cloud - Paa. S, Saa. S Science Analysis VM 0 VM Storage Login VM Science Analysis Compute Analysis VM 0 Cluster VM VM 0 Science Analysis VM 0 VM Storage Project 1 -org Direct access to batch processing cluster Lotus Batch Compute Project 2 -org File Server VM Storage eos-cloud-org Direct File System Access Standard Remote Access Protocols – ftp, http, … Data Centre Archive Cloud. Bio. Linux Fat Node Unmanaged Cloud – Iaa. S, Paa. S, Saa. S
![First cloud tenants • EOS Cloud • Environmental bioinformatics • MAJIC • Interface to First cloud tenants • EOS Cloud • Environmental bioinformatics • MAJIC • Interface to](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-17.jpg)
First cloud tenants • EOS Cloud • Environmental bioinformatics • MAJIC • Interface to land-surface model • NERC Environmental Work Bench • Cloud-based tools for scientific workflows
![Consortia Atmospheric & Polar Science Archive Earth Observation & Climate Services NERC HPC Committee Consortia Atmospheric & Polar Science Archive Earth Observation & Climate Services NERC HPC Committee](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-18.jpg)
Consortia Atmospheric & Polar Science Archive Earth Observation & Climate Services NERC HPC Committee Ecology & Hydrology Genomics Geology NCAS CMS Director cross-cutting activities Consortium Oceanography & Shelf Seas Solid Earth & Mineral Physics Project Managed Cloud JASMIN v. Org Unmanaged. Cloud JASMIN v. Org a Tb disk b GB RAM c v. CPU d Tb disk e Gb RAM f v. CPU Group Workspace x Tb disk non-JASMIN ARCHER, RDF x Tb RDF disk y AU ARCHER
![5+6. 9+4. 4=16. 3 PB Disk 12 x 20 c 256 GB 4 x 5+6. 9+4. 4=16. 3 PB Disk 12 x 20 c 256 GB 4 x](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-19.jpg)
5+6. 9+4. 4=16. 3 PB Disk 12 x 20 c 256 GB 4 x 48 c 2, 000 GB 11. 3 PB Phase 1 and 2 were joined in Phase 3 12 PB
![Panasas Storage • Parallel file system (cf Lustre, GPFS, p. NFS etc) – – Panasas Storage • Parallel file system (cf Lustre, GPFS, p. NFS etc) – –](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-20.jpg)
Panasas Storage • Parallel file system (cf Lustre, GPFS, p. NFS etc) – – • • Single Namespace 140 GB/sec benchmarked (95 shelves PAS 14) Access via Pan. FS client/NFS/CIFS Posix filesystem out of the box. Mounted on Physical machines and VMs 103 shelves PAS 11 + 101 shelves PAS 14 + 40 Shelves PAS 16 – Each shelf connected at 10 Gb (20 Gb PAS 14) – 2, 684 ‘Blades’ – Largest single realm & single site in the world • One Management Console • TCO: Big Capital, Small Recurrent but JASMIN 2 £/TB < GPFS/Lustre offerings
![Three year growth pains 172. 16. 144. 0/21 = 2, 000 IPs 130. 246. Three year growth pains 172. 16. 144. 0/21 = 2, 000 IPs 130. 246.](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-21.jpg)
Three year growth pains 172. 16. 144. 0/21 = 2, 000 IPs 130. 246. 136. 0/21 Flat Overlaid L 2 160 ->240 Ports @ 10 Gb
![Overview 4 x VMware Clusters v. JASMIN 156 cores, 1. 2 TB £ 12. Overview 4 x VMware Clusters v. JASMIN 156 cores, 1. 2 TB £ 12.](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-22.jpg)
Overview 4 x VMware Clusters v. JASMIN 156 cores, 1. 2 TB £ 12. 5 M, 38 Racks, 850 Amps, 25 tonnes, Panasas Storage 20 PBytes 15. 1 PB (usable) 40 x 10 Gb 3 Terabit/s bandwidth Lotus HPC Cluster 385 x 10 Gb 468 x 10 Gb 32 x 10 Gb Net. App+Dell 1010 TB + (VM VMDK images) A network : 1, 100 Ports @ 10 Gb. E MPI network (10 Gb low latency eth) v. Cloud 208 -1648 cores, 1. 5 TB – 12. 8 TB 144 -234 hosts, 2. 2 K-3. 6 K cores. RHEL 6, Platform LSF, MPI 40 x 10 Gb Light. Paths @ 1&2 Gb/s and 10 Gb/s: Leeds, UKMO, Archer, CEMS-Space. Cat
![Network Design Criteria • • • Non-Blocking (No network contention) Low Latency ( < Network Design Criteria • • • Non-Blocking (No network contention) Low Latency ( <](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-23.jpg)
Network Design Criteria • • • Non-Blocking (No network contention) Low Latency ( < 20 u. S MPI. Preferably < 10 u. S) Small latency spread. Converged (IP storage, SAN storage, Compute, MPI) 700 -1100 Ports @ 10 Gb Expansion to 1, 600 ports and beyond wo forklift. Easy to manage and configure Cheap …… later on: Replaces JASMIN 1 240 ports in place.
![Floor Plan Network distributed ~30 m x ~20 m JASMIN 1 JASMIN 4, 5 Floor Plan Network distributed ~30 m x ~20 m JASMIN 1 JASMIN 4, 5](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-24.jpg)
Floor Plan Network distributed ~30 m x ~20 m JASMIN 1 JASMIN 4, 5 (2016– 20) …) JASMIN 3 JASMIN 2 Science DMZ
![Cabling Costs: Central Chassis Switch + To. R 312 Twinax Compute Rack Storage Rack Cabling Costs: Central Chassis Switch + To. R 312 Twinax Compute Rack Storage Rack](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-25.jpg)
Cabling Costs: Central Chassis Switch + To. R 312 Twinax Compute Rack Storage Rack 1, 000 Fibre Connections = £ 400 -600 K JASMIN 1+2 700 -1100 10 Gb Connections Storage Rack Network Rack Storage Rack Compute Rack To. R e. g Force 10 S 4810 P 48 x 10 Gb SFP+ 4 x 40 Gb QSFP+ Fully Populated To. R 6 x S 4810 To. R Switches 48 x Active Optic QSFP 1: 1 Contention To. R 20 x S 4810 To. R Switches 80 x Active Optic QSFP Lots of core 40 Gb ports needed. MLAG to 72 Ports ? . . Chassis switch ? … expansion/cost
![](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-26.jpg)
![1, 104 x 10 Gb. E Ports CLOS L 3 ECMP OSPF 954 Routes 1, 104 x 10 Gb. E Ports CLOS L 3 ECMP OSPF 954 Routes](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-27.jpg)
1, 104 x 10 Gb. E Ports CLOS L 3 ECMP OSPF 954 Routes • • • 768 Ports max. no expansion … so 12 spines Max 36 leaf switches : 1, 728 Ports @ 10 Gb. E Non-Blocking. Zero Contention (48 x 10 Gb = 12 x 40 Gb uplinks) Low Latency (250 n. S L 3 / per switch/router). 7 -10 u. S MPI Cheap ! …. . (ish)
![RAL Site Network (for comparison) 211 Routes RAL Site Network (for comparison) 211 Routes](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-28.jpg)
RAL Site Network (for comparison) 211 Routes
![ECMP CLOS L 3 Advantages • • Massive scale Low cost (Pay as you ECMP CLOS L 3 Advantages • • Massive scale Low cost (Pay as you](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-29.jpg)
ECMP CLOS L 3 Advantages • • Massive scale Low cost (Pay as you grow) High performance Low latency Standards based – supports multiple vendors Very small “blast radius” upon network failures Small isolated subnets Deterministic latency with a fixed spine and leaf https: //www. nanog. org/sites/default/files/monday. general. hanks. multistage. 10. pdf
![ECMP CLOS L 3 Issues • Managing scale: – #s of IPs, subnets, VLANs, ECMP CLOS L 3 Issues • Managing scale: – #s of IPs, subnets, VLANs,](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-30.jpg)
ECMP CLOS L 3 Issues • Managing scale: – #s of IPs, subnets, VLANs, Cables – Monitoring • Routed L 3 network: – Reqs dynamic OSPF routing (100’s routes per switch) – No L 2 between switches (VMware: SAN’s, v. Motion) • Reqs: DHCP Relay, VXLAN – Complex ‘traceroute’ seen by users.
![IP and Subnet Management 198. 101. 1 198. 101. 0/30 BC= 198. 101. 3 IP and Subnet Management 198. 101. 1 198. 101. 0/30 BC= 198. 101. 3](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-31.jpg)
IP and Subnet Management 198. 101. 1 198. 101. 0/30 BC= 198. 101. 3 198. 101. 2 4 IPs / Cable § § 2 x /21 Panasas Storage 4 x /24 Internet Connects 55 x /26 Fabric Subnets 264 x /30 Inter switch links § § § 514 VMs & Growing quickly 304 Servers, 850 Switches/PDUs etc 2, 684 Storage Blades ~260 VLAN IDs 6, 000 IPs & Growing
![Monitoring / Visualisation • Complex Cacti – >30 Fabric Switches – >50 Mgmt Switches Monitoring / Visualisation • Complex Cacti – >30 Fabric Switches – >50 Mgmt Switches](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-32.jpg)
Monitoring / Visualisation • Complex Cacti – >30 Fabric Switches – >50 Mgmt Switches • 100’s links to monitor • Nagios bloat
![Fast Four routed ECMP hops Fast Four routed ECMP hops](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-33.jpg)
Fast Four routed ECMP hops
![Implications of L 3 CLOS for Virtualisation SP 1 LSW 1 172. 26. 66. Implications of L 3 CLOS for Virtualisation SP 1 LSW 1 172. 26. 66.](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-34.jpg)
Implications of L 3 CLOS for Virtualisation SP 1 LSW 1 172. 26. 66. 0/26 SP 2 LSW 2 SP 3 LSW 3 SP 4 LSW 4 172. 26. 64/26 • - • • • VXLAN Overlays Multicast routing → PIM Panasas access still direct Routed i. SCSI # Subnets Sub-optimal IP use Underlay networks i. SCSI host 001 host 025 host 002 host 026 host 024 host 027 130. 246. 128. 0/24 130. 246. 129. 0/24
![The need for speed 3. 5 Million Jobs LOTUS Cumulative # Jobs 347 Gbps The need for speed 3. 5 Million Jobs LOTUS Cumulative # Jobs 347 Gbps](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-35.jpg)
The need for speed 3. 5 Million Jobs LOTUS Cumulative # Jobs 347 Gbps 201206 201208 201210 201212 201304 201306 201308 201310 201312 201404 201406 201408 201410 201412 201502 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0
![Further info • JASMIN – http: //www. jasmin. ac. uk • Centre for Environmental Further info • JASMIN – http: //www. jasmin. ac. uk • Centre for Environmental](http://slidetodoc.com/presentation_image_h/e90be3bb4dd05e36c424f0c49baaf59a/image-36.jpg)
Further info • JASMIN – http: //www. jasmin. ac. uk • Centre for Environmental Data Archival – http: //www. ceda. ac. uk • JASMIN paper Lawrence, B. N. , V. L. Bennett, J. Churchill, M. Juckes, P. Kershaw, S. Pascoe, S. Pepler, M. Pritchard, and A. Stephens. Storing and manipulating environmental big data with JASMIN. Proceedings of IEEE Big Data 2013, p 6875, doi: 10. 1109/Big. Data. 2013. 6691556
- Slides: 36