Data Curation Brenna Wheeler AskALibrarian GRA brennawheelerutexas edu
Data Curation Brenna Wheeler Ask-A-Librarian GRA brennawheeler@utexas. edu
TDR Data Curation Capstone • Objectives: • Assess Datasets currently in UT Austin Dataverse • Create a Data Curation Workflow • Based on Data Curation Network’s CURATE(D) Workflow
Data Curation Network (DCN) • Network to assist with data curation • Created CURATE(D) Workflow • Data Primers Check Files & Documentation Understand Data Request Missing Info Augment Metadata Transform Files Evaluate for FAIRness Document Curation
FAIRness • TDR Fulfills: • Accessible • Interoperable • Our Focus: Findable (Rich Metadata) Accessible (Free, Open) Interoperable (Standard Schema) Reusable (Sufficient Description) • Findable • Reusable Wilkinson et al. (2016)
Can you understand the data in front of you?
Check Goals Checklist • Review Content • Verify Metadata • Review Documentation • • What files types are there? Do they open as expected? Is there a description? What type of documentation? • Readme? • Codebook? • Data Dictionary?
Understand Goals Checklist • Quality Assurance & Usability issues • Enough Documentation for reuse • Can you easily understand the data presented? • Does the documentation: • • • Mention software used? Explain data production process? Describe data cleaning process? • Do you need a Data Primer? https: //datacurationnetwork. org/resources/d ata-curation-primers/
Augment Goals Checklist • Enhance Metadata for discoverability • Create/Apply metadata (such as keywords) • Structure metadata in domain-specific • Does the file names, description, or schemas organization need development? • Are there any keywords? • Is there a controlled vocabulary they could follow? • Are there links to: • • • Publications? Related datasets? Source data?
Transform Goals Checklist • Identify formats & restrictions • Transform files into non-proprietary formats • Can you identify the file formats and the software used? • Will the files need to be updated in the future? • Can the files be transformed into an open, non-proprietary format? • Do you need a Data Primer? https: //datacurationnetwork. org/resources/d ata-curation-primers/
FAIRness Goals • Does the data meet FAIR requirements? • Are there any additional suggestions for getting it closer to being FAIR? Checklist Findable • Metadata Exceeds author/title/date Accessible • Free access • Open, non-proprietary formats Interoperable • Standard Schema Reusable • Sufficient Description • Creators, Owners, and Stewards listed
Request Goals • Reach out to dataset owners to improve dataset Checklist • 3 -4 main suggestions
Documentation Goals • Record necessary information for what happened to the dataset Checklist • Work in Progress
Resources Data Curation Activities, Data Curation Network. Retrieved from https: //datacurationnetwork. org/data-curation-activities/ Data Primers, Data Curation Network. Retrieved from https: //datacurationnetwork. org/resources/data-curation-primers/ Education Modules, Data. ONE. Retrieved from https: //www. dataone. org/educationmodules ICPSR, Data Management. Retrieved from https: //www. icpsr. umich. edu/icpsrweb/content/datamanagement/index. html Our Workflow, Data Curation Network. Retrieved from https: //datacurationnetwork. org/resources-2/
Works Cited Wilkinson, M. D. et al. (2016) “The FAIR Guiding Principles for Scientific Data Management and Stewardship” in Scientific Data (3). doi: 10. 1038/sdata. 2016. 18
- Slides: 14