The LifeChanging Magic of Open Refine The OpenSource

  • Slides: 18
Download presentation
The Life-Changing Magic of Open. Refine The Open-Source Art of Data Decluttering and Organizing

The Life-Changing Magic of Open. Refine The Open-Source Art of Data Decluttering and Organizing Maristella Feustle MOUG - Portland, Oregon January 30, 2018

Today’s data sets http: //goo. gl/P 4 z. Kk. E

Today’s data sets http: //goo. gl/P 4 z. Kk. E

Why Open. Refine? Establishing meaningful relationships within and between datasets requires that all of

Why Open. Refine? Establishing meaningful relationships within and between datasets requires that all of the data are being read as intended. Use Open. Refine to: ● Clean up data - standardize, correct, rearrange ● Automate tedious editing ● Match with outside controlled vocabularies ● Prep and export data for use in programs like Marc. Edit, Tableau, Gephi, Raw, Carto. DB, Google Fusion. Tables, and more

Before and after

Before and after

A brief history Freebase Gridworks Google Refine Open. Refine GREL - General Refine Expression

A brief history Freebase Gridworks Google Refine Open. Refine GREL - General Refine Expression Language Many versions to choose from. Even “release candidate” versions are useful.

Creating a project Data can be as simple as an Excel spreadsheet inventory, or

Creating a project Data can be as simple as an Excel spreadsheet inventory, or even a text file Data “lives” on your computer - security, preservation pros and cons Marc. Edit can export TSV or JSON files to Open. Refine. Memory issues with manipulations of very large files

http: //www. thinkgeek. com/product/6806/

http: //www. thinkgeek. com/product/6806/

Then what? How you proceed depends on: 1. The final form you want your

Then what? How you proceed depends on: 1. The final form you want your data to have 2. How much intervention your data requires to get there

*Disclaimer Maristella mainly works in archival descriptive standards. She has not been evaluated by

*Disclaimer Maristella mainly works in archival descriptive standards. She has not been evaluated by the FDA (FRBR’s Dedicated Aficionados) and is not intended to diagnose, treat, cure, or prevent any RDArelated maladies. Consult your cataloging professional for advice.

Project 1: Tabular data Columns are the basis of organization in Open. Refine. If

Project 1: Tabular data Columns are the basis of organization in Open. Refine. If your data is already in rows and columns (Excel, CSV, TSV, etc. ), there is little translation to do in importing a project: “What you see is what you get. ” Check encoding for special characters. Specify a first row as header, if applicable.

Basic transformations Moving things around Editing data Demonstration: Column order, sorting, facets, clustering.

Basic transformations Moving things around Editing data Demonstration: Column order, sorting, facets, clustering.

Lather, rinse, repeat Can extract command history to reuse on other datasets.

Lather, rinse, repeat Can extract command history to reuse on other datasets.

Fun with GREL Many alternatives exist to writing code from scratch for common transformations:

Fun with GREL Many alternatives exist to writing code from scratch for common transformations: https: //data-lessons. github. io/library-openrefine/04 -basic-functions-II/ https: //github. com/Open. Refine/wiki/Recipes http: //arcadiafalcone. net/Google. Refine. Cheat. Sheets. pdf

Project 2: MARCEdit Exports Source: http: //www. dlib. vt. edu/projects/OAI/marcxml. html Use MARCEdit /

Project 2: MARCEdit Exports Source: http: //www. dlib. vt. edu/projects/OAI/marcxml. html Use MARCEdit / MARCBreaker to parse raw MARCEdit exports to Open. Refine as JSON and TSV. Use Facet function to isolate MARC fields.

Project 3: Bibframe 1. 0 Source: http: //kcoyle. net/bibframe/book. rdf. xml

Project 3: Bibframe 1. 0 Source: http: //kcoyle. net/bibframe/book. rdf. xml

Export TSV and other formats.

Export TSV and other formats.

Other resources: Terry Reese: Marc. Edit and Open. Refine: http: //blog. reeset. net/archives/1873 Open.

Other resources: Terry Reese: Marc. Edit and Open. Refine: http: //blog. reeset. net/archives/1873 Open. Refine. org, book and videos: http: //openrefine. org/ Free Your Metadata: http: //freeyourmetadata. org/ Open. Refine Reconciliation services: http: //refine. codefork. com/ More reconcilable data sources: https: //github. com/Open. Refine/wiki/Reconcilable-Data-Sources Overdue Ideas: “A Worked Example of Fixing Problem MARC Data”: http: //www. meanboyfriend. com/overdue_ideas/? s=a+worked+example

Thank you! Maristella. Feustle@unt. edu If you can read this, you don’t need glasses.

Thank you! Maristella. Feustle@unt. edu If you can read this, you don’t need glasses.