Module 4 Data Documentation How to Write Good

  • Slides: 64
Download presentation
Module 4 Data Documentation How to Write Good Metadata

Module 4 Data Documentation How to Write Good Metadata

Topics Steps for organizing data Steps for preparing high quality metadata Tips and tools

Topics Steps for organizing data Steps for preparing high quality metadata Tips and tools for writing good metadata How to Write Good Metadata Identification_Information Citation_Information Originator: Publication_Date Title: Geospatial_Data_Presentation_Form Publication_Information Publication_Place Publisher Larger_Work_Citation_Information

Learning Objectives After completing this lesson, the participant will be able to: ◦ Identify

Learning Objectives After completing this lesson, the participant will be able to: ◦ Identify the steps in preparing high quality metadata ◦ Describe fundamental content required for good metadata ◦ Identify software tools to assist with metadata creation How to Write Good Metadata

The Data Life Cycle Plan Publish & Share Preserve How to Write Good Metadata

The Data Life Cycle Plan Publish & Share Preserve How to Write Good Metadata Contributes to the Metadata Record Acquire & Process Analyze

Metadata Information to let you find, understand, and use the data Descriptors Documentation -Source:

Metadata Information to let you find, understand, and use the data Descriptors Documentation -Source: Bob Cook, Oak Ridge National Laboratory A metadata record is a file of information, usually presented as an XML document, which captures the basic characteristics of a data or information resource. How to Write Good Metadata Source: http: //www. fgdc. gov/metadata 4

Metadata is needed to understand data The details of the data …. Parameter name

Metadata is needed to understand data The details of the data …. Parameter name Measurement date How to Write Good Metadata Sample ID location Slide credit: Bob Cook, Oak Ridge National Laboratory

Metadata is needed to understand data units method lab field Parameter def. Units def.

Metadata is needed to understand data units method lab field Parameter def. Units def. date words, words. QA def. Units QA flag generator date org. type name custodian address, etc. How to Write Good Metadata Method def. method parameter name media Measurement records sample ID location Sample def. type date location generator Record system coord. elev. type depth GIS Slide credit: Bob Cook, Oak Ridge National Laboratory

The 20 -Year Rule The metadata accompanying a data set should be written for

The 20 -Year Rule The metadata accompanying a data set should be written for a user 20 years into the future. ◦ What does that investigator need to know to use the data? Prepare the data and documentation for a user who is unfamiliar with your project, methods, and observations NRC (1991) How to Write Good Metadata Slide credit: Bob Cook, Oak Ridge National Laboratory

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata How to Write Good Metadata

Organize your data 1. 2. 3. 4. 5. 6. Define the contents of your

Organize your data 1. 2. 3. 4. 5. 6. Define the contents of your data files Use consistent data organization Assign descriptive file names Preserve data and processing information Assign descriptive data set titles Acknowledge contributions Best practice: Organize your data BEFORE you begin your project work during the data management planning stage of the data life cycle. How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

1. Define the contents of your data sets Content flows from a science plan

1. Define the contents of your data sets Content flows from a science plan (hypotheses) and is informed from requirements of final archive Data set elements include entities (nouns) and attributes (adjectives that describe the entities) Define the entities, attributes, and attribute values ◦ ◦ Names Units Formats Descriptions Parameter name Measurement date How to Write Good Metadata Sample ID location Modified from: Bob Cook, Oak Ridge National Laboratory

1. Define the contents of your data sets Content flows from a science plan

1. Define the contents of your data sets Content flows from a science plan (hypotheses) and is informed from requirements of final archive Data set elements include entities (nouns) and attributes (adjectives that describe the entities) Define the entities, attributes, and attribute values ◦ ◦ Names Units Formats Descriptions Parameter name Measurement date How to Write Good Metadata Sample ID location Modified from: Bob Cook, Oak Ridge National Laboratory

1. Define the contents: Names and units Use commonly accepted entity or parameter names

1. Define the contents: Names and units Use commonly accepted entity or parameter names that describe the contents (e. g. , precip for precipitation) Use consistent capitalization (e. g. , not temp, Temp, and TEMP in same file) Explicitly state units in the data file and the metadata ◦ SI units are recommended How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

1. Define the Contents of Your Data Files 1. Define the contents: Formats Choose

1. Define the Contents of Your Data Files 1. Define the contents: Formats Choose a format for each parameter and use that format throughout the file ◦ Use yyyymmdd (Example: January 2, 1999 is 19990102) ◦ Use 24 -hour notation (Example: 13: 30 hrs instead of 1: 30 p. m. and 04: 30 instead of 4: 30 a. m. ) ◦ Report in both local time and Coordinated Universal Time (UTC) What’s the difference between UTC and GMT? What is the time zone of Arizona? How to Write Good Metadata Modified from Bob Cook, Oak Ridge National Laboratory

1. Define the contents: Definitions CODE Scholes (2005) How to Write Good Metadata Scholes

1. Define the contents: Definitions CODE Scholes (2005) How to Write Good Metadata Scholes (2005), cited by Bob Cook, Oak Ridge National Laboratory

1. Define the contents: Data set Latitude (deg ) Define field formats in the

1. Define the contents: Data set Latitude (deg ) Define field formats in the metadata record (yyyy. mm. dd) Longitude Elevation Date (deg) (m) Site Name Site Code Kataba (Mongu) k -15. 43892 Pandamatenga p -18. 65651 Define 25. 49955 1138 in 2000. 03. 07 site code attribute the metadata record (k = Kataba forest) -31. 49688 25. 01973 365 2000. 06. 15 Skukuza Flux Tower skukuza 23. 25298 1195 2000. 02. 21 Scholes, R. J. 2005. SAFARI 2000 Woody Vegetation Characteristics of Kalahari and Skukuza Sites. Data set. Available on-line [http: //daac. ornl. gov/] from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U. S. A. doi: 10. 3334/ORNLDAAC/777 How to Write Good Metadata Modified from Bob Cook, Oak Ridge National Laboratory

1. Define the contents: Tips Think about the long-term effects: � Do not use

1. Define the contents: Tips Think about the long-term effects: � Do not use jargon � Define technical terms and acronyms: ◦ CA, LA, GPS, GIS : what do these mean? � Clearly state data limitations ◦ E. g. , data set omissions, completeness ◦ Considerations for appropriate re-use of the data � Use “none” or “unknown” meaningfully ◦ None usually means that you knew about data and nothing existed (for example, a “ 0” cubic feet per second stream discharge value) ◦ Unknown means that you don’t know whether that data existed or not (for example, a null value) How to Write Good Metadata

2. Use consistent data organization Be consistent in file organization and formatting Do not

2. Use consistent data organization Be consistent in file organization and formatting Do not change or re-arrange columns Use column headings to describe content of each column How to Write Good Metadata Modified from Bob Cook, Oak Ridge National Laboratory

2. Use consistent data organization Station Date Temp Precip Units YYYYMMDD C mm HOGI

2. Use consistent data organization Station Date Temp Precip Units YYYYMMDD C mm HOGI 19961001 12 0 HOGI 19961002 14 3 HOGI 19961003 19 -9999 Example of a good approach (1): Each row in a file represents a complete record, and the columns represent all the parameters that make up the record. Note: -9999 is a missing value code for the data set How to Write Good Metadata Modified from Bob Cook, Oak Ridge National Laboratory

2. Use consistent data organization Station Date Parameter Value Unit HOGI 19961001 Temp 12

2. Use consistent data organization Station Date Parameter Value Unit HOGI 19961001 Temp 12 C HOGI 19961002 Temp 14 C HOGI 19961001 Precip 0 mm HOGI 19961002 Precip 3 mm Example of a good approach (2): How to Write Good Metadata Parameter name, value, and units are placed in individual columns. This approach is used in relational databases. Modified from Bob Cook, Oak Ridge National Laboratory

Example: Poor data organization Multiple tables in one worksheet How to Write Good Metadata

Example: Poor data organization Multiple tables in one worksheet How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

Example: Better data organization Allometry Data from ORNL DAAC üColumn headings describe the content

Example: Better data organization Allometry Data from ORNL DAAC üColumn headings describe the content of each column üOne data type per column (for example: string, integer, double precision) How to Write Good Metadata Data are also offered in comma separated variable (csv) text files Modified from Bob Cook, Oak Ridge National Laboratory

3. Assign descriptive file names File names should be unique and reflect the file

3. Assign descriptive file names File names should be unique and reflect the file contents Bad file names ◦ Mydata ◦ 2001_data A better file name ◦ bigfoot_agro_2000_gpp. tif Big. Foot is the project name Agro is the field site name 2000 is the calendar year GPP represents Gross Primary Productivity data tif is the file type – Geo. TIFF How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

Dilbert’s file naming convention How to Write Good Metadata Modified from: Bob Cook, Oak

Dilbert’s file naming convention How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

How to Write Good Metadata A story told in file names Modified from: Bob

How to Write Good Metadata A story told in file names Modified from: Bob Cook, Oak Ridge National Laboratory

3. Assign descriptive file names: File naming and organization go hand-in-hand Biodiversity Make sure

3. Assign descriptive file names: File naming and organization go hand-in-hand Biodiversity Make sure your file system is logical and efficient Keep a set of similar measurements together in one file (e. g. , same investigator, methods, time basis, and instruments) Lake Biodiv_H 20_heat. Exp_2005_2008. csv Experiments Biodiv_H 20_predator. Exp_2001_2003. csv … Field work Biodiv_H 20_plankton. Count_start 2001_active. csv Biodiv_H 20_chla_profiles_2003. csv … Grassland From S. Hampton How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

4. Preserve data and processing information “Keep your raw data raw”- No transformations, interpolations,

4. Preserve data and processing information “Keep your raw data raw”- No transformations, interpolations, etc. , in raw files Raw Data File Processing Script (R) Giles_zoop. Count_Diel_2001_2003. csv ### Giles_zoop_temp_regress_4 jun 08. r TAX C F M F C F M N COUNT 3. 97887358 0. 97261354 0. 53051648 0 11. 9 10. 8823893 43. 5295571 21. 7647785 61. 6668725 … TEMPC 12. 3 12. 7 12. 1 ### Load data Giles<read. csv("Giles_zoop. Count_Diel_2001_2003. csv") ### Look at the data Giles 12. 8 13. 1 14. 2 12. 9 plot(COUNT~ TEMPC, data=Giles) ### Log Transform the independent variable (x+1) Giles$Lcount<-log(Giles$COUNT+1) ### Plot the log-transformed y against x From S. Hampton How to Write Good Metadata plot(Lcount ~ TEMPC, data=Giles) Modified from: Bob Cook, Oak Ridge National Laboratory

4. Preserve data and processing information: Document data processing steps Use a scripted language

4. Preserve data and processing information: Document data processing steps Use a scripted language to process data R Statistical package (free, powerful) SAS MATLAB Processing scripts are records of processing ◦ Scripts can be revised, rerun Graphical User Interface-based analyses may seem easy, but don’t leave a record A tool like Kepler can help document processing steps How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

5. Assign descriptive data set titles � Titles are critical in helping readers find

5. Assign descriptive data set titles � Titles are critical in helping readers find your data ◦ While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a data set meets their needs. ◦ Treat the title as the opportunity to sell your data set. �A complete title includes: What, Where, When, Who, and Scale � An informative title includes: topic, timeliness of the data, specific information about place and geography How to Write Good Metadata

5. Assign descriptive data set titles Data set titles should ideally describe the type

5. Assign descriptive data set titles Data set titles should ideally describe the type of data, time period, location, and instruments used (e. g. , Landsat 7). Titles should be concise (< 85 characters) Data set title should be similar to names of data files ◦ Bad: “Productivity Data” ◦ Good: SAFARI 2000 Upper Air Meteorological Profiles, Skukuza, Dry Seasons 1999 -2000" How to Write Good Metadata Modified from: Bob Cook, Oak Ridge National Laboratory

5. Assign descriptive data set titles Which title is better? ◦ Rivers OR ◦

5. Assign descriptive data set titles Which title is better? ◦ Rivers OR ◦ Greater Yellowstone Rivers from 1: 126, 700 U. S. Forest Service Visitor Maps (1961 -1983) ___________________ Greater Yellowstone (where) Rivers (what) from 1: 126, 700 (scale) U. S. Forest Service (who) Visitor Maps (1961 -1983) (when) How to Write Good Metadata

6. Acknowledge contributions Who contributed data to your data set? How should your data

6. Acknowledge contributions Who contributed data to your data set? How should your data be located, used, and cited by others? All data have been collected and curated by Dr. C. Hippocrepis at All Rotifers University. Data may be freely downloaded for non-commercial uses. Please contact Dr. C. Hippocrepis ([email protected] edu) to let us know if you use these data. We report uses of the public data to our funders, and it is extremely helpful to know if others have been able to use these data in teaching or research. From S. Hampton How to Write Good Metadata

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata How to Write Good Metadata

Write your metadata: Content An “interview approach” for creating metadata 1. What does the

Write your metadata: Content An “interview approach” for creating metadata 1. What does the data set describe? 2. Who produced the data set? 3. Why was the data set created? 4. How reliable are the data; what problems remain in the data set? 5. How can someone get a copy of the data set? 6. Who wrote the metadata? How to Write Good Metadata from http: //geology. usgs. gov/tools/metadata/tools/doc/ctc/

Identification Information • What is the name of the data set? • Who developed

Identification Information • What is the name of the data set? • Who developed the data set? • What geographic area does it cover? • What themes of information does it include? • How current are the data? • Are there restrictions on accessing or using the data? Data Quality Information • How good are the data? • Is information available that allows a user to decide if the data are suitable for his or her purpose? • What is the positional and attribute accuracy? • Are the data complete? • Were the consistency of the data verified? • What data were used to create the data set, and what processes were applied to these sources? Spatial Data Organization Information • What spatial data model was used to encode the spatial data? • How many spatial objects are there? • Are methods other than coordinates, such as street addresses, used to encode locations? Spatial Reference Information • Are coordinate locations encoded using longitude and latitude? • Is a map projection or grid system, such as the State Plane Coordinate System, used? • What horizontal and vertical datums are used? • What parameters should be used to convert the data to another coordinate system? Entity and Attribute Information • What geographic information (roads, houses, elevation, temperature, etc. ) is included? • How is this information encoded? • Were does used? • What do the codes mean? Distribution Information • From whom can I obtain the data? • What formats are available? • What media are available? • Are the data available online? • What is the price of the data? How to Write Metadata Reference Good Metadata Information • When were the metadata compiled? • By whom Metadata sections, elements and structure

Write your metadata: Tools exist to make the task of writing metadata easier ◦

Write your metadata: Tools exist to make the task of writing metadata easier ◦ Create and re-use templates with pre-populated fields ◦ Write standards-compliant metadata ◦ Export metadata to a standard format for uploading to a clearinghouse One such tool is Metavist Other tools are listed here-- ◦ http: //www. fgdc. gov/metadata/geospatial-metadata-tools How to Write Good Metadata

Write your metadata: Metavist File based metadata creation tool Creates FGDC compliant metadata Independent

Write your metadata: Metavist File based metadata creation tool Creates FGDC compliant metadata Independent of GIS software Public domain software Covers all FGDC elements plus Biological Data Profile elements (Taxonomy, Methodology, and Analytical Tools) Geospatial metadata elements NOT automatically collected from GIS data layers Metadata stored / output in XML format Import existing metadata file (must be XML file with proper formatting) Create templates with partial record with template components and import to start a new metadata record Works with Microsoft Windows Developed by USDA Forest Service – North Central Research Station Metavist: http: //ncrs. fed. us/pubs/viewpub. asp? key=2737 How to Write Good Metadata

Getting Started with Metavist Start Metavist from the desktop icon Click the Paper symbol

Getting Started with Metavist Start Metavist from the desktop icon Click the Paper symbol to create a new metadata record Metavist: http: //ncrs. fed. us/pubs/viewpub. asp? key=2737 How to Write Good Metadata

Tips for Writing Good Metadata Write simply but completely Fill out as much information

Tips for Writing Good Metadata Write simply but completely Fill out as much information as you can (minimally compliant metadata is minimally useful metadata) Document for a general audience ◦ Consider that people reading your record may not be an expert in the field and thus may not understand certain scientific domain-specific terms ◦ Define uncommon or scientific domain-specific terms the first time they are used Be consistent in style and terminology ◦ Multiple records from within an organization can be compared more quickly Use existing metadata tools to facilitate correct formatting How to Write Good Metadata

Tips for Writing Good Metadata Be specific and quantify when you can! The goal

Tips for Writing Good Metadata Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use it without contacting the data set owner. Vague: We checked our work and it looks complete. Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections. How to Write Good Metadata

Tips for Writing Good Metadata Select keywords wisely Use descriptive and clear writing Fully

Tips for Writing Good Metadata Select keywords wisely Use descriptive and clear writing Fully qualify geographic locations Use a thesaurus for keywords whenever possible Example: USGS Biocomplexity Thesaurus (over 9, 500 terms) http: //thesaurus. nbii. gov How to Write Good Metadata

Tips for Writing Good Metadata Remember: A computer will read your metadata Do not

Tips for Writing Good Metadata Remember: A computer will read your metadata Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / < > ~ Avoid using tabs, indents, or line feeds/carriage returns as they will not translate correctly When copying and pasting from other sources, use a text editor (e. g. , Notepad++) to eliminate hidden characters How to Write Good Metadata

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata

Steps to Create Quality Metadata Organize your data Write your metadata Review your metadata How to Write Good Metadata

Steps for Metadata Review for accuracy and completeness Have someone else read your file

Steps for Metadata Review for accuracy and completeness Have someone else read your file Revise it, based on comments from your reviewer Review it once more before you publish it ü Does the metadata record present all the information needed to use or repurpose the data? ü Would you stand by the information in the metadata record much like a peer-reviewed journal publication? How to Write Good Metadata

Tips for Metadata Review Use existing automated quality control tools to check for required

Tips for Metadata Review Use existing automated quality control tools to check for required and mandatory fields ◦ Example: Metadata Parser (http: //geo-nsdi. er. usgs. gov/validation/) Put the metadata record away and read it the next day to ensure the information makes sense Have a colleague read the metadata (peer review) to get another perspective on the record How to Write Good Metadata

Summary Organize your data Write your metadata ◦ Use tools like Metavist Review your

Summary Organize your data Write your metadata ◦ Use tools like Metavist Review your metadata ◦ ◦ Review for accuracy and completeness Have someone else read your file Revise it, based on comments from your reviewer Review it with a tool like “mp” How to Write Good Metadata

Metadata Creation: Tools and How-To Resources FGDC Metadata Quick Guide ◦ http: //www. fgdc.

Metadata Creation: Tools and How-To Resources FGDC Metadata Quick Guide ◦ http: //www. fgdc. gov/metadata/documents/Metadata. Quick. Guide. pdf Content Standard for Digital Geospatial Metadata Workbook ◦ http: //www. fgdc. gov/metadata/documents/workbook_0501_bmk. pdf Information and software: http: //geology. usgs. gov/tools/metadata/ Standards and tools: http: //www. fgdc. gov/ Example of metadata creation using free software tools: Burley, T. E. , and Peine, J. D. , 2009, NBII-SAIN Data Management Toolkit, U. S. Geological Survey Open-File Report 2009– 1170, 96 p. Available at http: //pubs. usgs. gov/of/2009/1170/ How to Write Good Metadata

What did you learn?

What did you learn?

1. The metadata accompanying a data set should be written for a user __________.

1. The metadata accompanying a data set should be written for a user __________. 20 years into the future who may not be able to contact the data set owner for information who is unfamiliar with the data set All of the above How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

While individuals are searching for the most appropriate data sets, they are most likely

While individuals are searching for the most appropriate data sets, they are most likely going to use the _____ as the first criteria to determine if a data set meets their needs. 2. metadata author title year How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

3. Which date format is preferred for data sets and metadata records? 17760704 July

3. Which date format is preferred for data sets and metadata records? 17760704 July 4, 1776 7/4/76 07 -04 -1776 How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

4. When selecting keywords for a metadata record, use _________. Discipline-specific terminology (dictionary) Controlled

4. When selecting keywords for a metadata record, use _________. Discipline-specific terminology (dictionary) Controlled vocabularies (thesauri) Terminology extracted from the data set Jargon used by industry experts How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

Excellent! Proceed to the next question Next How to Write Good Metadata

5. Policies for access and sharing should include _____ property and_____ issues. Political, policy

5. Policies for access and sharing should include _____ property and_____ issues. Political, policy Intellectual, copyright Intellectual, policy Institutional, funding How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Think about this … Review this section again Return How to Write Good Metadata

Congratulations! This completes the quiz. Proceed to the next slide. . . Next How

Congratulations! This completes the quiz. Proceed to the next slide. . . Next How to Write Good Metadata

Before you go. . . We want to hear from you! CLICK the arrow

Before you go. . . We want to hear from you! CLICK the arrow to take our short survey. How to Write Good Metadata