Keck CISM Software Management Keck CISM and RSQSim

  • Slides: 43
Download presentation
Keck CISM Software Management Keck CISM and RSQSim Meeting 9 March 2016

Keck CISM Software Management Keck CISM and RSQSim Meeting 9 March 2016

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Integrating RSQSim into Keck CISM • Scientific Software Management

Southern California Earthquake Center 3. Collaboratory Development Plan Phil Maechling SCEC IT Architect Keck

Southern California Earthquake Center 3. Collaboratory Development Plan Phil Maechling SCEC IT Architect Keck Site Visit, March 11, 2015

Southern California Earthquake Center CISM Modular Processing Architecture Define Rupture Catalog Define list of

Southern California Earthquake Center CISM Modular Processing Architecture Define Rupture Catalog Define list of possible earthquakes for region of Interest during period of interest Calculate Rupture Ground Motions Calculate ground motions produced by each rupture in region of interest Assign Rupture Probabilities Forecast Future Ground Motions Assign a probability to each rupture in catalog during period of interest Combine ground motions with probabilities to produce probabilistic ground motion forecast Build a modular, extensible, distributed, high performance computing framework: 1. Define and execute a multi-stage series of scientific calculations 2. Execute calculations on external resources and return results to SCEC 3. Modularize construction to enable evaluation of multiple alternative methods 4. Ensure repeatable and reviewable results * We will use a workflow-based distributed computing framework developed on SCEC HPC Projects

Southern California Earthquake Center CISM Modular Processing Architecture Define Rupture Catalog Define list of

Southern California Earthquake Center CISM Modular Processing Architecture Define Rupture Catalog Define list of possible earthquakes for region of Interest during period of interest Open. SHA UCERF 3 Ruptures Calculate Rupture Ground Motions Calculate ground motions produced by each rupture in region of interest Cyber. Shake 3 D Wave Propagation Simulations Assign Rupture Probabilities Forecast Future Ground Motions Assign a probability to each rupture in catalog during period of interest Combine ground motions with probabilities to produce probabilistic ground motion forecast Open. SHA UCERF 3 Probabilities Combine Amplitudes into Forecast

CISM Modular Processing Architecture Define Rupture Catalog Define list of possible earthquakes for region

CISM Modular Processing Architecture Define Rupture Catalog Define list of possible earthquakes for region of Interest during period of interest RSQSim Long-Period Earthquake Simulations Calculate Rupture Ground Motions Calculate ground motions produced by each rupture in region of interest Cyber. Shake 3 D Wave Propagation Simulations Assign Rupture Probabilities Forecast Future Ground Motions Assign a probability to each rupture in catalog during period of interest Combine ground motions with probabilities to produce probabilistic ground motion forecast Open. SHA ETAS Probabilities Combine Amplitudes into Forecast

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Integrating RSQSim into Keck CISM • Scientific Software Management

Southern California Earthquake Center Essential CISM Scientific Codes 1. Open. SHA. Implement Uniform California

Southern California Earthquake Center Essential CISM Scientific Codes 1. Open. SHA. Implement Uniform California Earthquake Rupture Forecast 2 and 3, GMPEs, and probabilistic seismic hazard processing / Language: Java / Mult-threaded / Primary Developers: Ned Field, Kevin Milner 2. RSQSim. Large‐scale simulations of earthquake occurrence to characterize system‐level response of fault systems including processes that control time, place, and extent of earthquake slip/ Language: C / MPI-based / Primary Developers: James Dieterich, Keith Richards-Dinger 3. Cyber. Shake. 3 D wave propagation simulations for large set of ruptures, and seismogram processing resulting in peak ground motions and other parameters / Language: C / MPIbased / Primary Developers: Robert Graves, Scott Callaghan, Philip Maechling, Thomas Jordan 4. CSEP. Automated execution and evaluation of short term earthquake forecast models / Language: Python / Multi-threaded / Primary Developers: D. Schorlemmer, T. Jordan, M. Liukis 1. Field, E. H. , T. H. Jordan, and C. A. Cornell (2003), Open. SHA: A Developing Community-Modeling Environment for Seismic Hazard Analysis, Seismological Research Letters, 74, no. 4, p. 406 -419. 2. Richards‐Dinger, K. , and James H. Dieterich (2012) RSQSim Earthquake Simulator Seismological Research Letters, 2012 v. 83 no. 6 p. 983 -990 doi: 10. 1785/0220120105 3. Graves, R. , T. Jordan; S. Callaghan; E. Deelman; E. Field; G. Juve; C. Kesselman; P. Maechling; G. Mehta; K. Milner; D. Okaya; P. Small; and K. Vahi (2010). Cyber. Shake: A Physics-Based Seismic Hazard Model for Southern California, Pure Applied Geophys. , v. 169, i. 3 -4 DOI: 10. 1007/s 00024 -010 -0161 -6 4. Zechar, J. D. , D. Schorlemmer, M. Liukis, J. Yu, F. Euchner, P. J. Maechling and T. H. Jordan (2010) The Collaboratory for the Study of Earthquake Predictability Perspective on Computational Earthquake Science Concurrency and Computation: Practice and Experience, Vol. 22, 1836 -1847, 2010.

Southern California Earthquake Center Computing Environment Develop a distributed computing environment, based at USC

Southern California Earthquake Center Computing Environment Develop a distributed computing environment, based at USC HPCC, that utilizes NSF and DOE HPC systems • Establish both an operational and development computing environment • Maintain cumulative data results locally • Provide external interfaces to forecasts and forecast results

Southern California Earthquake Center Focus CISM Software Development on Defining Workflows to Minimize Software

Southern California Earthquake Center Focus CISM Software Development on Defining Workflows to Minimize Software Development Workflow Configuration Environment Workflow Execution Environment (CISM Software Development) (Existing Open-Source Software)

Southern California Earthquake Center CISM Computational System

Southern California Earthquake Center CISM Computational System

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Scientific Software Management • Integrating RSQSim into Keck CISM • Software Licenses

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Integrating RSQSim into Keck CISM • Scientific Software Management

Southern California Earthquake Center SCEC Software Engineering Practices Iterative Development Process (quarterly cycle) •

Southern California Earthquake Center SCEC Software Engineering Practices Iterative Development Process (quarterly cycle) • Develop end-to-end processing that provides scientific value • Deploy operational system and operate during next iteration • Extended system, preserving existing and add new capabilities Software Engineering Practices • Software version control • Automated testing frameworks • Standards based data formats and management • Metadata collection • Process logging • Error detection and monitoring

Keck CISM Operational Platform • Develop a system that can run routinely, produce forecasts

Keck CISM Operational Platform • Develop a system that can run routinely, produce forecasts either periodically or on event. • What we need to do is deploy an operational system. • Current plans are to release new system version on quarterly basis. • All codes integrated, tested, and tagged for reproducibility of results.

Quarterly Build Plan • Would involve Masha as early as possible. • Will setup

Quarterly Build Plan • Would involve Masha as early as possible. • Will setup a SCEC git repository (github/gitlab) and maintain Keck CISM project files to build system. • Develop simple Keck CISM system processing early and elaborate as needed. • Computing framework needed that can dispatch from here run remotely return results. • We expect to integrate an approved version of RSQSim into the Keck CISM system for routine calculations. • We use scientific and computing reviews prior to largescale production runs.

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Integrating RSQSim into Keck CISM • Scientific Software Management

Southern California Earthquake Center SCEC Scientific Software Distributions

Southern California Earthquake Center SCEC Scientific Software Distributions

SCEC CSEP Testing Center

SCEC CSEP Testing Center

SCEC Broadband Platform

SCEC Broadband Platform

Obs. Py

Obs. Py

SPECFEM 3 D

SPECFEM 3 D

Southern California Earthquake Center CISM Development Plan • System Requirements • System Architecture •

Southern California Earthquake Center CISM Development Plan • System Requirements • System Architecture • Computing Environment • Essential Software Components • Computational and Data Estimates • Software Development Process • IP Considerations

Southern California Earthquake Center CISM IP Principles 1. Integrate best-available academic codes developed and

Southern California Earthquake Center CISM IP Principles 1. Integrate best-available academic codes developed and contributed by research community 2. Accept NSF-support and private company gifts to support software development 3. Release as free and open-source software to support scientific transparency and build confidence in results 4. License software in way it can be used by academic and US agencies including USGS.

Ownership • A person or organization who owns the software allows others to copy

Ownership • A person or organization who owns the software allows others to copy it and use it. • Software owner typically the people who wrote the software. • However, the company the employed the person who wrote the code may have ownership also.

Southern California Earthquake Center Software Licensing Apache License v 2. 0: Key Rights and

Southern California Earthquake Center Software Licensing Apache License v 2. 0: Key Rights and Issues 1. Software distribution must include license 2. Software distribution must include source code 3. No warranty offered 4. User agrees to no liability 5. User are granted copyright to software and source code 6. Users granted patent license to use software 7. Users are not permitted to use any trademarks in distribution without permission 8. Private use is allowed 9. Commercial use is allowed 10. Redistribution is allowed with licenses intact 11. Users is allowed to make modifications 12. User must state what changes they made 13. User can distribute modifications under different, including proprietary, licenses 14. Users are permitted to link to any other software that uses different, including proprietary, licenses

Spectrum of Software Rights

Spectrum of Software Rights

Types of Software Licenses

Types of Software Licenses

Authors Rights and Users Rights Start with what author rights the owner wants to

Authors Rights and Users Rights Start with what author rights the owner wants to protect. Examples might include: • Ensure source code is available to software users • Ensure aspects of the software remain proprietary • Ensure software is free to some, or all, users • Prohibit use of the software in a commercial product • Allow commercial versions of the software in the future • Prohibit others from creating a commercial version of the software A key decision is will be identified as the owner of the software. The owner offers a license to users. • Software owner typically the people who wrote the software. • However, the company the employed the person who wrote the code may have ownership also.

SI 2 Program Requirement • State which software license will be used for the

SI 2 Program Requirement • State which software license will be used for the released software, and why this license has been chosen. (NSF expects that a standard open source license will be used, but a different option can be proposed if well justified in terms of meeting the SI 2 program goals. )

Standard Open Source Software Licenses Free and open-source licenses are commonly classified into two

Standard Open Source Software Licenses Free and open-source licenses are commonly classified into two categories: 1. Those with the aim to have minimal requirements about how the software can be redistributed (permissive licenses) 2. protective share-alike (copyleft Licenses). Can the software be used in a commercial product? Can the user redistribute the software?

Open Source Software Initiative

Open Source Software Initiative

Types of Software Licenses

Types of Software Licenses

Approved Free-Open Source Software Licenses

Approved Free-Open Source Software Licenses

Commonly Used Scientific Software Licenses

Commonly Used Scientific Software Licenses

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM

Southern California Earthquake Center Keck CISM System Development • Elements of the Keck CISM System • System Development and Software Management • Integrating RSQSim into Keck CISM • Scientific Software Management

Keck CISM and RSQSim Code Management Issues • Likely we will have multiple groups

Keck CISM and RSQSim Code Management Issues • Likely we will have multiple groups using some version of the software. • Likely need to maintain a stable version of RSQsim code and one, or more, developmental/experimental version. • Multiple groups forking their own version • Must Manage HPC system specific configurations • Identifying the exact codes used to produce a research result.

Challenges for RSQSim • When we create an RSQsim build (release), we will retrieve

Challenges for RSQSim • When we create an RSQsim build (release), we will retrieve a source code version of all required codes. • CISM build will generate a version, and tag the code to identy all files used in a build (release). • We will need to work out how that’s done. – We retrieve a distribution from your svn when we make a build – You use the git repositiry.

End

End

Southern California Earthquake Center Use Cases in CISM Proposal Year 1: • Couple the

Southern California Earthquake Center Use Cases in CISM Proposal Year 1: • Couple the empirical Uniform California Earthquake Rupture Forecast to the Cyber. Shake ground-motion forecasting models of the Los Angeles region. • Provide new computational tools to assist the development of rupture simulators such as RSQSim and ground-motion simulators such as Cyber. Shake. Year 2: • Couple the RSQSim physics-based rupture simulator to the Cyber. Shake ground-motion forecasting models • Retrospectively calibrate and test the resulting comprehensive forecasting models. Year 3: • Construct a computational environment that can sustain the long-term development of comprehensive, physics-based earthquake forecasting models • Submitted exemplars to CSEP for prospective testing against observed earthquake activity in California.

Southern California Earthquake Center Additional CISM System Requirements CISM designed to meet several non-functional

Southern California Earthquake Center Additional CISM System Requirements CISM designed to meet several non-functional requirements: 1. Use existing scientific software written in a variety of programming languages 2. Use local computing resources and high-performance parallel computing resources from external resource providers 3. Be able to “show our work” to support scientific review of results. 4. Be inexpensive to design, build, maintain, and operate 5. Be easy to modify without significant re-implementation or down time. 6. Support new development without impacting ongoing operations 7. Run for years to get statistically significant results

Examples of Permissive License Apache-2 software license allows a group to copy BBP, modify

Examples of Permissive License Apache-2 software license allows a group to copy BBP, modify some files, make a list of the files they modified and change their license from Apache-2 to private. Then they could offer BBP for sale. If people were willing to pay for their version, we could not stop them from selling their version of broadband, and redistributing our BBP software in their product.

Examples of Copy-Left We offer BBP open-source Apache-2 software license. People can derive a

Examples of Copy-Left We offer BBP open-source Apache-2 software license. People can derive a version from BBP and offer their changes under a different license, including a private license. GPL would not allow that. For copy left, all derived works must be offered open-source.