Data Integrity Report Data Integrity Working Group April

  • Slides: 9
Download presentation
Data Integrity Report Data Integrity Working Group April 2008

Data Integrity Report Data Integrity Working Group April 2008

Background • • Management Council Action: – Crichton agreed to chair an MC working

Background • • Management Council Action: – Crichton agreed to chair an MC working group on data integrity. New suggested starting with Level 4 requirements before dealing with any implementation issues. The DIWG should have recommendations before the Tech Session tackles the technical issues in data integrity at its proposed face-to-face meeting. The Tech Session can then decide where to go with the existing SCR on checksums, including sub-issues of specificity and process. Members – – – – Dan Crichton, EN, chair Mitch Gordon, Rings Ed Guinness, Geosciences Bill Harris, PPI Steve Hughes, EN Al Schultz, GSFC Mark Showalter, Rings Tom Stein, Geosciences 2

Background • Policy – Each node is responsible for periodically verifying the integrity of

Background • Policy – Each node is responsible for periodically verifying the integrity of its archival holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. (Adopted by MC November 2006) • PDS-2010 – Data Integrity is as a critical project for implementation in PDS-2010 3

Scope • Data Integrity – Protects the integrity of files, particularly in data exchange

Scope • Data Integrity – Protects the integrity of files, particularly in data exchange • Tracking – Protects the integrity of file collections ensuring that PDS can account for all files that is has received and is managing • Availability – Ensures files are available, particularly that three copies of the data are available • Preservation – PDS is actively verifying the long term preservation and usability of its data (PDS Requirement 4. x) – Identified in the beginning by DIWG, but postponed – More on this… 4

Requirements Status • Draft Requirements (Sept 2007) submitted to the MC for comments –

Requirements Status • Draft Requirements (Sept 2007) submitted to the MC for comments – Requirements covered Data Integrity, Tracking, and Availability – Several comments received which are being addressed (more to come on that…) • Gap in addressing preservation requirements – Original thought was to cover it some where else, however, it is closely related to integrity of the archive – It is a shared responsibility between PDS and NSSDC, however, PDS needs to be the owner of the requirements and the plan – Simpson pointed out concerns that we have a PDS Requirement for QQC (Quality, Quantity, Continuity) that should be included 5

PDS Requirements Related to Preservation 4. 1 Long-Term Preservation: PDS will determine requirements for

PDS Requirements Related to Preservation 4. 1 Long-Term Preservation: PDS will determine requirements for and ensure long-term preservation of the data 4. 1. 1 PDS will define and maintain a set of quality, quantity, and continuity (QQC) requirements for ensuring long term preservation of the archive 4. 1. 2 PDS will develop and implement procedures for periodically ensuring the integrity of the data <--- this has to be at both the nodes and at NSSDC 4. 1. 3 PDS will develop and implement procedures for periodically refreshing the data by updating the underlying storage technology 4. 1. 4 PDS will develop and implement a disaster recovery plan for the archive 4. 1. 5 PDS will meet U. S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC) 6

PDS Requirements Related to Preservation 4. 2 Long-Term Usability: PDS will establish long-term usability

PDS Requirements Related to Preservation 4. 2 Long-Term Usability: PDS will establish long-term usability requirements and implement procedures for meeting them 4. 2. 1 PDS will define and maintain a set of usability requirements to ensure on-going utility of the data in the archive 4. 2. 2 PDS will develop and implement procedures for periodically monitoring the user community interests and practices and verifying the usability of the products in the archive 4. 2. 3 PDS will monitor the evolution of technology including physical media, storage, and software in an effort to keep the archiving technology decisions relevant within the PDS 4. 2. 4 PDS will provide a mechanism to upgrade products or data sets which do not meet usability requirements (e. g. , data sets from old missions) 7

Key Comments and Items Being Addressed • Node comments received from RS, GEO, Imaging,

Key Comments and Items Being Addressed • Node comments received from RS, GEO, Imaging, and PPI • Of these comments, key Node issues include: – Preservation / QQC requirements not addressed – Some questions regarding Level 3 requirements • Data Integrity cuts across PDS extensively – Lack of reference to “method / mechanism” for ensuring data integrity • Should PDS call for use of checksums? – Recommendation to clarify terminology – Recommendation to better integrating the three critical areas of data integrity, tracking and availability 8

Plan • • • Requirements – Finish addressing the identified issues – Address preservation

Plan • • • Requirements – Finish addressing the identified issues – Address preservation requirements – Update the document – Reconvene the WG (~ May 2008) – Report back to MC in July 2008 Implementation Plan – Develop implementation plan in conjunction with PDS 2010 – Address the areas of integrity from a multi-pronged approach including policies, standards, and software • Ensure that all areas of integrity are being implemented – Present back to MC – Begin implementation in FY 09 Policy Issues – Given that we are an online system, does PDS need to set a resolution stating when data will be required to have a “checksum” (e. g. , data delivered in PDS 4 will be required to have a checksum)? • Creates a milestone and gives direction to the technical group 9