The OSF at Notre Dame CNI Fall 2015

  • Slides: 27
Download presentation
The OSF at Notre Dame, CNI Fall 2015 The Open Science Framework (OSF) at

The OSF at Notre Dame, CNI Fall 2015 The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and Supporting the Research Mission CNI Fall 2015 Membership Meeting Washington, DC Lead Andrew Sallans Natalie Meyers Partnerships E-Research Librarian Center for Open Science University of Notre Dame

12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

OSF Extensions & Pilots @ ND

OSF Extensions & Pilots @ ND

I want to preserve my simulation method and results so other people can try

I want to preserve my simulation method and results so other people can try it out. mysim. exe –in data –out output –p 10 data output DOI: 10. XXXX DOI: 10. YYYY DOI: 10. CCCC 12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/ DOI: 10. ZZZZ … and repeat this 1 M times with different –p values.

But it’s not that simple! 12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e

But it’s not that simple! 12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

I want to preserve my simulation method and results so other people can try

I want to preserve my simulation method and results so other people can try it out. mysim. exe –in data –out output –p 10 calib HTTP GET SIM_MODE=clever data output config libsim ruby Green Goat Linux 57. 83. 09. B X 86 -64 CPU / 64 GB RAM / 200 GB Disk

Challenges of Reproducible Computing • Your application works perfectly today on your machine •

Challenges of Reproducible Computing • Your application works perfectly today on your machine • Will your application still work next month? • Will your application still work next year? • Will your application still work 10 years later? • Will your application still work today on another machine?

daspos. org The DASPOS Project Team includes computer science experts from the University of

daspos. org The DASPOS Project Team includes computer science experts from the University of Notre Dame and the University of Chicago, physicists from the ATLAS and CMS experiments at the LHC, the DØ experiment at the Tevatron, experts in other data-intensive fields such as bioinformatics and astrophysics, and digital librarians with broad experience in the preservation of large datasets in the sciences and humanities. The DASPOS project has been funded in whole or in part with Federal funds from the National Science Foundation, under Award No. 1247316.

daspos. org Goal and Scope of Project The goal of DASPOS is to “scout

daspos. org Goal and Scope of Project The goal of DASPOS is to “scout out” solutions to the most pressing technical problems, and make them available to those constructing preservation systems. In particular, this project will: • Establish a dialogue with other fields facing preservation and re-use issues with Big Data. Identify areas of commonality and outline where solutions diverge due to specific needs. • Develop metadata to support the preservation and re-use of HEP data, and its related software and computational algorithms. Design the metadata so as to meet the needs of as many other fields as possible for wide re-use. • Define a reference architecture for a data preservation system targeted for HEP but coordinated with other fields. Include decision points where policy choices impact the architectural structure. • Develop a preservation validation test-bed on which a technical implementation of the reference architecture can be developed and constructed. • Perform a Curation Challenge, where a physics data analysis is conducted based solely on curated and archived data. 12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

Vec. Net’s Malaria Modelers - Share Simulations & Results

Vec. Net’s Malaria Modelers - Share Simulations & Results

Vec. Net Digital Library Our digital library software stack and features were first developed

Vec. Net Digital Library Our digital library software stack and features were first developed and presented for beta feedback in 2013: Brower D, Lakshminarayanan B, Meyers N. Multiple Identities: Managing Authorities in Repositories and Digital Collections presented at American Library Association Annual Conference, Chicago, IL 2013. and then again at last year’s ACM/IEEE JCDL conference : Barker M, Brower D, and Meyers N. Vector-Borne Disease Network Digital Library presented at Digital Libraries 2014 IEEE(978 -1 -4799 -5569 -5) London, UK, Sept 9, 2014.

Vec. Net Digital Library dl. vecnet. org Vector-borne Disease Network Vec. Net digital library

Vec. Net Digital Library dl. vecnet. org Vector-borne Disease Network Vec. Net digital library supports mathematical modeling of malaria transmission & eradication. It is a repository for curating & sharing information about simulations used to model malaria transmission & the impact of interventions Contains: field, lab, survey, climate, demographic, and simulation data, input file code snippets, input file sets for models, simulations, tagged bibliographic citations, articles, maps, reports and more on entomology, epidemiology, demography, climatology, and interventions

Dynamic Data Citation & Repositories for Research Data Meyers N. Dynamic Data Citation: Vec.

Dynamic Data Citation & Repositories for Research Data Meyers N. Dynamic Data Citation: Vec. Net Use Meyers Case N. Vec. Net Digital Library & Data Citation for Simulations presented at Institute for Disease Modeling 3 rd presented at Federation of Earth Science Information Partners’ Winter Meeting Dynamic Data Citation Annual Modeling Symposium, Bellevue, Washington, April 22, 2015 Workshop, Washington, D. C. , Jan 8, 2015. Boehm, R. and Meyers N. Repository Platforms for Research Data: Vec. Net Use Case presented at Research Data Alliance (RDA) 6 th Plenary Meeting, Paris, Sept 25, 2015.

Getting to Know OSF • Attended Andrew Sallans’ talk “Improving Integrity, Transparency, and Reproducibility

Getting to Know OSF • Attended Andrew Sallans’ talk “Improving Integrity, Transparency, and Reproducibility Through Connection of the Scholarly Workflow” during NISO’s virtual Conference: Scientific Data Management: Caring for Your Institution and its Intellectual Wealth. February 18, 2015 • Attended Open Repositories ‘ 15 and was attracted to OSF features • Hosted an A Panel Presentation of the Co. S Reproducibility Projects at Notre Dame’s Center for Digital Scholarship Sept 9, 2015.

OSF Related Ongoing Efforts at ND • Integrating our Institutional Repository w/OSF (CAS Authentication)

OSF Related Ongoing Efforts at ND • Integrating our Institutional Repository w/OSF (CAS Authentication) • Embarked on NDS Dashboard integration w/CRC & Ian Taylor • Piloting registration of select Vec. Net malaria data files in OSF • Testing Umbrella Software Preservation tool interactivity with OSF (Open. Malaria simulation execution Use Case) • Working on a reproducible software engineering environment by creating and documenting a reproducible development environment for the OSF framework – Openstack images to run OSF frontend – backend service on CRC resources and Vagrant/Virtualbox files for use by developers on their laptops (ongoing)

Why OSF and an Institutional Repository? 1. 2. 3. 4. 5. 6. 7. Why

Why OSF and an Institutional Repository? 1. 2. 3. 4. 5. 6. 7. Why integrate OSF w/Curate. ND? -> Start Staging Data for Preservation & initial sharing btwn collaborators Institutional Branding and Central Authentication -> Fosters Ease of Use & Trust Among Institutional Researchers Group Role Enhancements –> Hierarchical Lab Roles Storage Source Configuration -> Flexibility of Resources Integration with Computational Environment -> Access to HPC & Reuse Metadata Enhancements to OSF -> Incrementally & automatically add Metadata prior to a preservation phase effort Push OSF Project Snapshot (aka Registration) to Curate. ND –> EZ deposit to Institutional Repository preservation storage encourages institutional data preservation

Curate. ND Institutional Repository OSF Integration Contact: Rick Johnson rjohns 14@nd. edu

Curate. ND Institutional Repository OSF Integration Contact: Rick Johnson rjohns 14@nd. edu

NDS OSF Dashboard integration Contact: Ian Taylor ian. j. taylor@gmail. com bitbucket. org/nds-dashboard http:

NDS OSF Dashboard integration Contact: Ian Taylor ian. j. taylor@gmail. com bitbucket. org/nds-dashboard http: //www. nationaldataservice. org/ http: //ndspilot. com

Umbrella: Ensuring executable software preservation & reuse A Portable Environment Creator for Reproducible Computing

Umbrella: Ensuring executable software preservation & reuse A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids http: //ccl. cse. nd. edu/software/umbrella

Umbrella Features Makes Applications Portable and Reproducible Specify the execution environment clearly -- Hardware,

Umbrella Features Makes Applications Portable and Reproducible Specify the execution environment clearly -- Hardware, Kernel, OS, Software, Data, Environment Variables Materialize the execution environment at runtime automatically -- No need to configure environment manually -- Matching evaluation & choose minimal mechanism Loose-coupled with sandbox techniques: -- Parrot, chroot, VM, Docker Construct sandbox through mounting mechanisms without copying -- multiple namespaces can be constructed concurrently Utilize more computing resources: -- Local Machine, Grid, Cloud

Umbrella & OSF The Open Malaria Use Case A Tool for Ensuring executable software

Umbrella & OSF The Open Malaria Use Case A Tool for Ensuring executable software preservation & reuse

Umbrella & Open Malaria Use Case Contacts For more information about Umbrella: The Cooperative

Umbrella & Open Malaria Use Case Contacts For more information about Umbrella: The Cooperative Computing Lab http: //ccl. cse. nd. edu Haiyan Meng hmeng@nd. edu Douglas Thain dthain@nd. edu About Open Malaria Use Case: The Center for Research Computing http: //crc. nd. edu Alex Vyushkov avyushko@nd. edu Please use the following citation for Umbrella in a scientific publication: Haiyan Meng and Douglas Thain, Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids, Workshop on Virtualization Technologies in Distributed Computing (VTDC) at HPDC, June, 2015. DOI: 10. 1145/2755979. 2755982

Learning from DASPOS, Umbrella & NDS / OSF • • Repositories: Will they take

Learning from DASPOS, Umbrella & NDS / OSF • • Repositories: Will they take provisional data, active data. . . ? Compatibility: Can we plug into existing tools? Diff? Jupyter? Software Preservation Layers: Preserve program binaries, or sources + compilers, or something else? (Parrot, Umbrella, Prune. . . ) Naming: Tension between usability and durability: URL’s, DOIs, PIDs, UUIDs, HMACs, . . . Complexity of Composition: Connect systems together? NDS? OSF? Curate. ND? Citation: Dynamic? Static? For publication? For reuse ? Usability: Do users have to change behavior? Overhead: Tools must be close to native performance, or they won’t get used.

OSF ND Immediate Next Efforts NDS dashboard Enhancements including backend container toolkit development •

OSF ND Immediate Next Efforts NDS dashboard Enhancements including backend container toolkit development • Fix bugs that cause exceptions for valid operations. • Optimize the toolkit to reduce time taken to perform tasks • Implement post operation to support the uploading of files into OSF storage providers Automate the uploading of the diff of the containers for each run into OSF storage Support VNC working on a container, users can pull up a remote desktop to a container & viewing remote desktop apps e. g. Pegasus workflow. • Backend integration of Jupyter notebooks • Front end spawning of these which manages the state i. e. spawn notebook, all editing and then copy edited content back into the OSF storage to update content

Potential Future OSF Projects OSF can be useful in other projects: Spatial Repellents Trial

Potential Future OSF Projects OSF can be useful in other projects: Spatial Repellents Trial Piloting use for master files management EU-funded Switch Project is considering use of the OSF • http: //www. switchproject. eu/ Collaboration with USC's Institute for Information Sciences on RACE: Repository and Workflows for Accelerating Circuit Realization. • RACE is developing a trusted repository for integrated circuit designs. • OSF / NDS Dashboard can be extended and integrated with the Pegasus Workflow system and interface to Curate. ND for long term circuit designs' preservation.

12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

12/15/2015 CNI Fall Mtg https: //osf. io/s 5 e 2 b/

The OSF at Notre Dame, CNI Fall 2015 Link to this Presentation: https: //osf.

The OSF at Notre Dame, CNI Fall 2015 Link to this Presentation: https: //osf. io/s 5 e 2 b/ Contact: Lead Andrew Sallans Natalie Meyers Partnerships E-Research Librarian Center for Open Science University of Notre Dame More Info: cos. io osf. io library. nd. edu ccl. cse. nd. edu crc. nd. edu vecnet. org daspos. org nationaldataservice. org