Metadata requirements for archiving structured data Alice Born
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9 -11 April 2008)
IMDB business model – archive focus Applications/ Software Survey Frame and sample Survey instance Datafiles in archive repository Questionnaire Datasets Products (COR) Survey design Data elements Value domains
Outline • Overview of the integrated metadatabase (IMDB) in the survey life cycle • Archive and disposal processes • Metadata requirements for archiving • Registration
IMDB in the survey life cycle Data Warehouses Operations Management Quality Assurance IMDB Design Analysis Metadata Collect Operational Data Edit Estimate Registers Dissemination IMDB Tabulate Survey Data Operational Data Stores Publish Administrative Data Archive + disposal
Processes for archiving and disposal 8. Archiving and disposal 8. 1 Manage archive repository • define archival format, format data, load data, record event dates, links to metadata registry (IMDB) 8. 2 Preserve data and associated metadata • Identify data/metadata, record archive request, transfer data/metadata, etc. 8. 3 Dispose data and associated metadata • Identify data to dispose, remove from archive, etc.
Types of metadata for archiving structured data (1) 1. Administrative metadata • • Maintaining and keeping track of the archived datafile Link to an IMDB record – SDDS or unique datafile identifier 2. Structural metadata • • Name of the datafile File format – record layout Software Storage media, location
Types of metadata for archiving structured data (2) 3. Survey and definitional metadata • • Already in the IMDB Survey description Questionnaires, questions, response choice Methodology Measures of data quality Variables (ISO 11179) Additional documentation – interviewer guide, coding tool
Administrative layer Statistical Activity Organization Survey Stewardship Identification Classification Contact Universe Documentation Frame Identification Survey instance Time Frame Instrument Keyword Question Theme Data file Methodology Administered items Instrument design Sampling Data source Error detection Imputation Estimation Quality evaluation Disclosure control Revisions and seasonal adjustment Data accuracy Data Element Concept Object Class Property Formula Conceptual Domain Value Domain
Registration status Registration Authority (Completeness, accuracy, adherence to quality and terminological description standards) Disposed Preferred standard Archived Standards Division Registrar Qualified Regular Registrar Recorded Responsible Owner Candidate (Content) Steward Submitter Incomplete Retired Historical
Corporate Memory: Data and metadata for archive and disposal phases Operational data Registers Survey Data Administrative Data Metadata for archive and disposal IMDB Public Use Microdata Files Clean Master Files Archive repository + Disposal
- Slides: 11