Tame Your Data with Open Refine GIL User

  • Slides: 35
Download presentation
Tame Your Data with Open. Refine GIL User Group Meeting May 14 th, 2015

Tame Your Data with Open. Refine GIL User Group Meeting May 14 th, 2015 Tricia Clayton tclayton 3@gsu. edu Collection Services Librarian Georgia State University Library

Explore Main Functions Clean & Transform Extend & Reconcile

Explore Main Functions Clean & Transform Extend & Reconcile

Getting Open. Refine • Download at http: //openrefine. org • Platform independent - based

Getting Open. Refine • Download at http: //openrefine. org • Platform independent - based on the Java environment Google Refine 2. 5 latest stable version Open. Refine 2. 6 development version

Comparison to other tools Open. Refine • Can batch edit rows and columns •

Comparison to other tools Open. Refine • Can batch edit rows and columns • Excellent for exploring & transforming data • No schema needed • Data is always visible Spreadsheets Databases • Edit one cell at a time • Excellent for data entry, functions, calculations • No schema needed • Data is always visible • Schema and scripting language needed for editing • Data is mostly out of site unless programming is used to run queries or build views

Getting help The Open. Refine wiki is housed on Git. Hub: https: //github. com/Open.

Getting help The Open. Refine wiki is housed on Git. Hub: https: //github. com/Open. Refine/wiki -includes installation instructions, documentation, tutorials, recipes, etc. Using Open. Refine by Ruben Verborgh and Max De Wilde, 2013

Getting started (on Windows) • Download the. zip file • Extract to a folder

Getting started (on Windows) • Download the. zip file • Extract to a folder of your choosing • Click the. exe file to run • The Command window opens and will run in the background [Ctrl-C in this window safely exits Open. Refine]

Runs in your default browser http: //localhost: 3333 or http: //127. 0. 0. 1:

Runs in your default browser http: //localhost: 3333 or http: //127. 0. 0. 1: 3333

Create project • Create a new project, • Open an existing one , •

Create project • Create a new project, • Open an existing one , • Or import from another Open. Refine instance. Supported file formats include: TSV, CSV, *SV, Excel (. xls and. xlsx), JSON, XML, RDF as XML, and Google docs

Create project Name the project Edit import options if necessary; options vary by file

Create project Name the project Edit import options if necessary; options vary by file type.

Basic navigation 1 3 4 2

Basic navigation 1 3 4 2

The “All” column Contains some features that let you perform operations on all columns

The “All” column Contains some features that let you perform operations on all columns at once: - reorder - remove - collapse or expand View – Collapse/Expand columns Edit columns – Re-order/remove columns

The other columns Most operations in Open. Refine act on a single column, and

The other columns Most operations in Open. Refine act on a single column, and are initiated from that column’s menu. The “Edit column” dropdown menu contains options to rename or remove the column, and provides limited options for moving the column (to the beginning, end, or one over in either direction). The “View” dropdown provides additional collapsing options

Project history: Undo / Redo • undo some (or all) of your project •

Project history: Undo / Redo • undo some (or all) of your project • extract/save parts of your project history • apply (import) steps from another project

Export options Export Menu

Export options Export Menu

Explore your data Open. Refine offers multiple ways to facet your data: – text

Explore your data Open. Refine offers multiple ways to facet your data: – text – number – timeline – blank – error – and more! Demo: http: //www. screencast. com/t/h 1 v 130 lt. Dl Image source: International Space Station Above Earth, by NASA, https: //www. flickr. com/photos/nasamarshall/9070896398/, (CC BY-NC 2. 0)

Filtering Text filtering matches cells that contain a string or regular expression.

Filtering Text filtering matches cells that contain a string or regular expression.

Sorting in Open. Refine is somewhat special… Demo: http: //www. screencast. com /t/m. EUVANx.

Sorting in Open. Refine is somewhat special… Demo: http: //www. screencast. com /t/m. EUVANx. Yz Image source: Lego Sorting, by jwhittenburg, https: //www. flickr. com/photos/jaydubya_rulez/207321782/, (CC-BY-NC-ND 2. 0).

Blank down / Fill down

Blank down / Fill down

Rows vs. records

Rows vs. records

Clean & transform General transformation tips: • Think in patterns – what are the

Clean & transform General transformation tips: • Think in patterns – what are the common characteristics of the cells/rows/columns you want to change • Use facets and filters to isolate – then use a single command to change the set

Common transforms

Common transforms

Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table

Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table in your Voyager Access query; now you want to separate the title and author information. Solution: Use some of the Edit Cells and Transpose options. original cell format after splitting multivalue cells http: //www. screencast. com/t/Ax. X 3 p. Og 6 U after transposing cells in rows into columns

Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron email addresses:

Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron email addresses: ILLiad user data Edit column menu

Splitting columns

Splitting columns

Clustering is magical Publisher data in Voyager can be messy. This video shows how

Clustering is magical Publisher data in Voyager can be messy. This video shows how clustering can be used to merge variations of the same publisher together. http: //screencast. com/t/d. MYQsus. Xj

GREL ^ is the symbol for starts with

GREL ^ is the symbol for starts with

Transforming with GREL Menu Option Result Edit cells: Transform… The regular expression transforms the

Transforming with GREL Menu Option Result Edit cells: Transform… The regular expression transforms the cells in active column Edit column: Add column based on this column The regular expression is run against the active column, but creates a new column

GREL

GREL

GREL: replacing The first set of ““ contains the string to replace; the 2

GREL: replacing The first set of ““ contains the string to replace; the 2 nd set contains what to replace it with. The preview shows that the “c” and the “. ” have been replaced with “nothing. ” This is two expressions chained together, not one. They are combined with the period that precedes the 2 nd “replace. ”

History and favorites The History tab stores expressions used previously in current AND other

History and favorites The History tab stores expressions used previously in current AND other projects. The Starred tab stores those you have marked as favorites.

A couple favorites The cell. cross function pulls data from one project into another

A couple favorites The cell. cross function pulls data from one project into another (based on a matching column – ISSN, Bib. ID, title, etc. ): syntax: cell. cross("Name of the source project", "name of the reference column"). cells["Name of the column you want to import"]. value[0] example – you’re working with the title list for your Wiley package renewal - from the column containing the ISSN info, add a new column using the following expression – it matches against the ISSN column in the Wiley COUNTER report, and pulls in the fulltext downloads: cell. cross("Wiley 2013 JR 1", “Print ISSN"). cells[“Reporting Period Total"]. value[0] Transform display call numbers into a normalized call numbers: 1) Remove periods value. replace(". ", "") 2) Separate letter groups followed by numbers (with a space) value. replace(/(p{Is. Alphabetic})(? =d)/, '$1 ') 3) Separate number groups followed by letters value. replace(/(d)(? =[A-Z])/, '$1 ')

Extend & reconcile Image source: Map of the Open. Refine Ecosystem, by Martin Magdinier,

Extend & reconcile Image source: Map of the Open. Refine Ecosystem, by Martin Magdinier, @magdmartin, http: //openrefine. org/2015/01/26/Mapping-Open. Refine-ecosystem. html

Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project,

Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project, https: //thenounproject. com/term/broom/30688/, (CC BY 3. 0 US). Bucket icon, By Alberto Guerra Quintanilla, from the Noun Project, https: //thenounproject. com/term/bucket/30690/, (CC BY 3. 0 US). Bullfighting icon, By Paulo Volkova, from the Noun Project, https: //thenounproject. com/term/bullfighting/3835/, public domain. Magic-Wand icon, By Mister Pixel, from the Noun Project, https: //thenounproject. com/term/magic-wand/34626/, (CC BY 3. 0 US).