Big Earth Data and the Powell Center Strategies
Big Earth Data and the Powell Center: Strategies for Integrating Data Kevin T. Gallagher USGS Core Science Systems April 25, 2013 U. S. Department of the Interior U. S. Geological Survey
Understanding the Critical Zone Core Science Systems Mission Area spans the Earth’s “Critical Zone. ” As defined by the National Research Council in 2001, the Critical Zone is the near-surface interface that extends from the tops of trees down to the base of the deepest groundwater.
Core Science Systems – Synthesis The John Wesley Powell Center for Analysis and Synthesis Provides a unique and stimulating environment that enables scientists to advance knowledge through collaborative and interdisciplinary investigation
Core Science Systems – Synthesis The Community for Data Integration Supports improved approaches to the integration of diverse data for new scientific insight through a community of practice open to the USGS and its partners.
United Nations & World Bank Global Issues 5
USGS Information Maps Sediment Water Quality Models Amphibian Deformity/ Decline Studies Migratory Bird Distribution Mapping Ground and Surface Water Studies National Atlas of the United States Geologic Maps Land Cover Characterization Drought Streamflow Integrated Taxonomic Information System Fish and Wildlife Disease Studies Aerial Photography Real-time Earthquake Data Coastal Information Landslide Hazard Assessments Real-time Water Data Satellite Imagery Mineral and Energy Resource 6 Assessments GAP Analysis Biodiversity Information Volcanic Hazards 6
The President’s Science Policy Quotes from the Office of Science and Technology: § Data-Enabled Science (“Big Data”) – “Improved approaches are needed to derive science and social value from the vast amount of data we are now acquiring. ” § “Research and development in such approaches as algorithms, data mining, analytics, and visualization tools should be priorities. ” 7
Data Integration U. S. Department of the Interior U. S. Geological Survey
Data Integration: What is it? § It appears intangible. § § § Is it a concept? A picture? An Architecture? A Portal? Data Standards? A Data Warehouse? § Will we know it when we see it? § How will we measure progress towards it? 9
Data Integration: What is it? Tools Data Extraction, Transformation, Load Interoperable Data Access & Discovery Semantic Standards, Vocabulary, Taxonomy, Schema or Dictionary Portals, Registries, Catalogs Standards for Exchange Applications Geospatial Analysis, Modeling, Visualization 10 Protocols, Metadata, Standards
What Does Data Integration Look Like? “Eye Test” Project Data - or - Project Data What do you see? 11
After Many Projects, What do you See? Project - or - Project Data Project 12
The Integrated Data View § § § Many participate in the data resource The Data is: § § § § Visible Indexed and Accessible Developed According to Science Planning Commonly defined Quality Controlled Designed and Standardized Yet Continually Evolving How do we Get There? 13 Project Project Data
The Community for Data Integration Today: Driving the Agenda, Defining Scope, Priority, and Tasks § § § § Volunteers! Chartered with over 500 members Sharing our stories once a month * Unified behind a long-term vision * Identifying and prioritizing projects that can deliver value or savings in the short term Sharing best practices Leveraging investments Serving on working groups (e. g. , semantics, citizen science, data management, technology) Evolving solutions through pilot development & testing 14
Community for Data Integration Project Selection Guidelines § § § § Focus on targeted efforts that solve a short-term science challenge Leverage existing capabilities (“bottom up”) Develop solutions or methodology that can be replicated or repeated as well as scaled Create sustainable infrastructure Seek efficiencies or substantial return on investment Expose corporate data Organize science models and outputs Preserve and access project data 15
Working with Partners (short list) § § § § § Earth Science Information Partnership (ESIP) Data. ONE Earthcube Geoscience Information Network (GIN) One. Geology CUASI Arizona Geological Survey Colorado State University Discover. Life EPA § § § § § ESRI Fish and Wildlife Service Lamont-Doherty Earth Observatory University of Auckland, New Zealand University of Idaho University of Wisconsin Rensselaer Polytechnic Institute (RPI) Tufts University Department of Agriculture NEON
USGS Community for Data Integration You are invited to join CDI! 17 https: //my. usgs. gov/confluence/display/cdi/Home
The John Wesley Powell Center http: //powellcenter. usgs. gov/ Cross-disciplinary scientific collaboration through Working Groups, now partnering with NSF, EPA and others. JWP Center and Community for Data Integration • Use JWP Center as an incubator to test the viability of CDI tools and guidance • Use Science Working Groups to identify new data management, modeling and tool requirements • Apply CDI tools to Science Working Groups, ensuring models and data are preserved and available using standards 18
JWP Center IT Support § § Data Extraction, Transformation, and Load (ELT) High performance computing § Internal Modeling Cluster § Oak Ridge National Lab and the NSF-funded XSEDE § In one example decreased model processing time from 312 days to 21 hours 19
JWP and Data Management § § Data Management Plans are Required WG Data Manager (Significant support from USGS staff) § Data resulting from synthesis efforts are stored and published via the USGS Science. Base repository 20 Image © University of Oregon
To Join the Community for Data Integration Contact: cdi@usgs. gov Thank You!
- Slides: 21