Lesson 1 Introduction to Data Management Why Data

  • Slides: 34
Download presentation
Lesson 1: Introduction to Data Management Why Data Management? U. S. Department of the

Lesson 1: Introduction to Data Management Why Data Management? U. S. Department of the Interior U. S. Geological Survey CC image by University of Maryland Press Releases on Flickr Tutorials on Data Management

Why Data Management Lesson Topics The data world around us Importance of data management

Why Data Management Lesson Topics The data world around us Importance of data management The data lifecycle The case for data management CC image by interpunct on Flickr § § Provided by Data. ONE

Why Data Management Learning Objectives After completing this lesson, the participant will be able

Why Data Management Learning Objectives After completing this lesson, the participant will be able to: § Give two general examples of why increasing amounts of data are a concern § Explain, using two examples, how lack of data management makes an impact § Define the research data lifecycle § Give one example of how well-managed data can result in new scientific conclusions Provided by Data. ONE

Why Data Management DATA REALITIES… Provided by Data. ONE

Why Data Management DATA REALITIES… Provided by Data. ONE

Why Data Management Provided by Data. ONE Images collected by Data. One. org

Why Data Management Provided by Data. ONE Images collected by Data. One. org

CC image by tajai on Flickr Photo courtesy of http: //modis. gsfc. nasa. gov/

CC image by tajai on Flickr Photo courtesy of http: //modis. gsfc. nasa. gov/ Photo courtesy of http: //www. futurlec. com Photo courtesy of www. carboafrica. net Image collected by Viv Hutchinson CC image by CIMMYT on Flickr Why Data Management Data deluge Data are collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship Provided by Data. ONE

Why Data Management The World of Data Around Us Petabytes Worldwide Transient information or

Why Data Management The World of Data Around Us Petabytes Worldwide Transient information or unfilled demand for storage Information Available Storage Source: John Gantz, IDC Corporation: The Expanding Digital Universe Provided by Data. ONE

Why Data Management The World of Data Around Us: Data Loss § Natural disaster

Why Data Management The World of Data Around Us: Data Loss § Natural disaster CC image by momboleum on Flickr § § § CC image by Sharyn Morrow on Flickr § § § § Facilities infrastructure failure Storage failure Server hardware/software failure Application software failure External dependencies (e. g. PKI failure) Format obsolescence Legal encumbrance Human error Malicious attack by human or automated agents Loss of staffing competencies Loss of institutional commitment Loss of financial stability Changes in user expectations and requirements Provided by Data. ONE

Why Data Management Poor Data Management Affects Everyone “MEDICARE PAYMENT ERRORS NEAR $20 B”

Why Data Management Poor Data Management Affects Everyone “MEDICARE PAYMENT ERRORS NEAR $20 B” (CNN) December 2004 Miscoding and Billing Errors from Doctors and Hospitals totaled $20, 000, 000 in FY 2003 (9. 3% error rate). The error rate measured claims that were paid despite being medically unnecessary, inadequately documented or improperly coded. In some instances, Medicare asked health care providers for medical records to back up their claims and got no response. The survey did not document instances of alleged fraud. This error rate actually was an improvement over the previous fiscal year (9. 8% error rate). “AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007 The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional. ” “OOPS! TECH ERROR WIPES OUT Alaska Info” (AP) March 2007 A technician managed to delete the data and backup for the $38 billion Alaska oil revenue fund – money received by residents of the State. Correcting the errors cost the State an additional $220, 700 (which of course was taken off the receipts to Alaska residents. ) Provided by Data. ONE

Why Data Management Poor Science Data Management Example A wildlife biologist for a small

Why Data Management Poor Science Data Management Example A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data were stored on her own workstation. When the biologist relocated to another office, no one understood how the data were stored or managed. Solution: A state office GIS specialist retrieved the workstation and sifted Cost: 1 work month ($4, 000) plus the value of data that were not recovered CC image by DTRave on Open Clip Art Library through files trying to salvage relevant data. Consider that the situation could have been worse, because the data were not being backed up as it would have been if stored on a server. Provided by Data. ONE

Why Data Management Poor Data Management Federal Agency Example In preparation for a Resource

Why Data Management Poor Data Management Federal Agency Example In preparation for a Resource Management Plan, an office discovered 14 duplicate GPS inventories of roads. However, because none of the inventories had enough metadata, it was impossible to know which inventory was best or if any of the inventories actually met their requirements. Cost: Estimated 9 work months/inventory @$4, 000/wm (14 inventories = $504, 000) CC image by ruffin_ready on Flickr Solution: Re-Inventory roads Provided by Data. ONE

Why Data Management Importance of Data Management “Please forgive my paranoia about protocols, standards,

Why Data Management Importance of Data Management “Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it. Several times, I've seen colleagues called to court in order to testify about conditions they have observed. Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like, to back up their testimony. It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present. ” - Nelson Williams, Scientist US Geological Survey Provided by Data. ONE

Why Data Management Importance of Data Management The climate scientists at the centre of

Why Data Management Importance of Data Management The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work. Provided by Data. ONE

Why Data Management Why Manage Data: USGS Perspective § Mandated to perform data management

Why Data Management Why Manage Data: USGS Perspective § Mandated to perform data management functions § CC image by Gerald G on Open Clip Art Library § by legislation and Executive Orders Well documented and easily accessible data sources save time and money when performing data research. Accurate data are legally and scientifically defensible. This may aid the agency by reducing litigations and appeals. Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Data Management Legislation Concerning Data § § § § § Information Quality Act

Why Data Management Legislation Concerning Data § § § § § Information Quality Act Clinger-Cohen Act Paperwork Reduction Act Computer Matching & Personal Privacy Act Government Performance & Results Act Government Paperwork Elimination Act Privacy Act Freedom of Information Act Executive Order 12906 (Geospatial Data) Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Data Management Legislation Concerning Data § § Section 515 of The Treasury &

Why Data Management Legislation Concerning Data § § Section 515 of The Treasury & General Government Appropriations Act for FY 2001 (Information Quality Act) allows the public to examine and challenge the data disseminated by the BLM and provides review procedures for those challenges. Clinger-Cohen Act (a. k. a. IT Management Reform Act) established the position of Chief Information Officer to oversee information quality and IT implementation. It mandates that agencies develop Enterprise-wide information architectures to improve business performance and data portability Paperwork Reduction Act of 1995 (44 USC 301 -3520) provides the basis for managing information as a resource. It mandates that agencies take steps to improve their data quality and data sharing capabilities Computer Matching and Personal Privacy Act expands the Privacy Act guarantees to ensure that privacy violations do not occur when databases are combined or integrated Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Data Management Implementing Instructions § § § A-11 IT Capital Planning (Project Justification)

Why Data Management Implementing Instructions § § § A-11 IT Capital Planning (Project Justification) A-16 Geospatial Data (established FGDC) A-119 Use and Adoption of Voluntary Systems Standards § A-123 Management Control Reviews § A-127 Management & Control of Financial Systems § A-130 Information Resources Management Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Data Management Bottomline § § § USGS recognizes its data as a lasting

Why Data Management Bottomline § § § USGS recognizes its data as a lasting resource USGS data support managers and decision makers by providing solid, accurate, reliable, useful, and timely data Customers (taxpayers) have paid for and are entitled to know the data USGS produce Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Manage Data: Researcher Perspective § Manage your data for yourself: Why Data Management

Why Manage Data: Researcher Perspective § Manage your data for yourself: Why Data Management § Keep yourself organized – be able to find your files § § § (data inputs, analytic scripts, outputs at various stages of the analytic process, etc) Track your science processes for reproducibility – be able to match up your outputs with exact inputs and transformations that produced them Better control versions of data – identify easily versions that can be periodically purged Quality control your data more efficiently Provided by Data. ONE

§ § Make backups to avoid data loss Format your data for re-use (by

§ § Make backups to avoid data loss Format your data for re-use (by yourself or others) Be prepared: Document your data for your own recollection, accountability, and re-use (by yourself or others) Prepare it to share it – gain credibility and recognition for your science efforts! CC image by UWW Res. Net on Flickr Why Manage Data: Researcher Perspective Why Data Management Provided by Data. ONE

Why Data Management: Foundation to Advance Science § § Data are valuable assets –

Why Data Management: Foundation to Advance Science § § Data are valuable assets – it is expensive and time consuming to collect Data should be managed to: § § § maximize the effective use and value of data and information assets continually improve the quality including: data accuracy, integrity, integration, timeliness of data capture and presentation, relevance and usefulness ensure appropriate use of data and information facilitate data sharing ensure sustainability and accessibility in long term for re-use in science Provided by Data. ONE

Why Data Management Facilitates Sharing and Re-use… Provided by Data. ONE

Why Data Management Facilitates Sharing and Re-use… Provided by Data. ONE

Why Data Management Well-Managed Data Can Result in Re-use, Integration and New Science Model

Why Data Management Well-Managed Data Can Result in Re-use, Integration and New Science Model results e. Bird Occurrence of Indigo Bunting (2008) Land Cover Jan Meteorology MODIS – Remote sensing data Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid. Apr Jun Sep Dec Potential Uses • Examine patterns of migration • Infer impacts of climate change • Measure patterns of habitat usage • Measure population trends Provided by Data. ONE

Why Data Management Images courtesy of Cornell Ornithology Lab Data Integration Provided by Data.

Why Data Management Images courtesy of Cornell Ornithology Lab Data Integration Provided by Data. ONE

Why Data Management Where a majority of data end up now… Provided by Data.

Why Data Management Where a majority of data end up now… Provided by Data. ONE

Why Data Management Imagine if data were more accessible… Provided by Data. ONE

Why Data Management Imagine if data were more accessible… Provided by Data. ONE

Why Data Management Well managed, publically accessible data are important: why? Here a few

Why Data Management Well managed, publically accessible data are important: why? Here a few reasons (from the UK Data Archive): § § § § Increases the impact and visibility of research Promotes innovation and potential new data uses Leads to new collaborations between data users and creators Maximizes transparency and accountability Enables scrutiny of research findings Encourages improvement and validation of research methods Reduces cost of duplicating data collection Provides important resources for education and training Provided by Data. ONE

Why Data Management New Discoveries “Planet hidden in Hubble archives” Science News (Feb. 27,

Why Data Management New Discoveries “Planet hidden in Hubble archives” Science News (Feb. 27, 2009) D. Lafrenière et al. , Ap. J Letters A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799. D. Lafrenière et al. , Astrophysical Journal Letters “The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble. “The second thing its tells you is having a well calibrated archive is necessary but not sufficient to make breakthroughs — it also takes a very innovative group of people to develop very smart extraction routines that can get rid of all the artifacts to reveal the planet hidden under all that telescope and detector structure. ” Provided by Data. ONE

Why Data Management What is the Data Lifecycle? Provided by Data. ONE

Why Data Management What is the Data Lifecycle? Provided by Data. ONE

Why Data Management What is the Data Lifecycle? § § The Data Lifecycle is

Why Data Management What is the Data Lifecycle? § § The Data Lifecycle is designed to provide a framework for data management. That framework is intended to allow for management of the data independent of the system or application that it resides in. The Data Lifecycle is a step by step progression of elements starting from left to right While most elements follow a linear pattern, some elements are present throughout the entire data lifecycle. Provided by Tom Chatfield, BLM Provided by Data. ONE

Why Data Management For Each Stage of the Data Lifecycle… § …there are best

Why Data Management For Each Stage of the Data Lifecycle… § …there are best practices…. . and…. tools to help! § The following data management lessons will illustrate in detail each stage of the data lifecycle § Your well-managed and accessible data can contribute to science in ways you may not even imagine today! Provided by Data. ONE

Why Data Management Summary § The data deluge has created a surge of information

Why Data Management Summary § The data deluge has created a surge of information that needs to be well-managed and made accessible. § The cost of not doing data management can be very high. § Be cognizant of best practices and tools associated with the data lifecycle to manage your data well. § Many benefits are associated with the act of managing data, including the ability to find, access, understand, integrate and re-use data. Provided by Data. ONE

Why Data Management Summary con’t § If data are: § § § Well-organized Documented

Why Data Management Summary con’t § If data are: § § § Well-organized Documented Preserved Accessible Verified as to Accuracy and validity Result is: § § High quality data Easy to share and re-use in science Citation and credibility to the researcher Cost-savings to science Provided by Data. ONE

Why Data Management References 1. 2. 3. 4. Bureau of Land Management. Data Management

Why Data Management References 1. 2. 3. 4. Bureau of Land Management. Data Management Training Workshop (2011) Strasser, Carly, Ph. D. Data Management for Scientists, February 2012 UK Data Archive. Managing and Sharing Data: Best Practices for Researchers, May 2011 DAMA International, The DAMA Guide to the Data Management Body of Knowledge Provided by Data. ONE