Tame Your Data with Open Refine GIL User
- Slides: 35
Tame Your Data with Open. Refine GIL User Group Meeting May 14 th, 2015 Tricia Clayton tclayton 3@gsu. edu Collection Services Librarian Georgia State University Library
Explore Main Functions Clean & Transform Extend & Reconcile
Getting Open. Refine • Download at http: //openrefine. org • Platform independent - based on the Java environment Google Refine 2. 5 latest stable version Open. Refine 2. 6 development version
Comparison to other tools Open. Refine • Can batch edit rows and columns • Excellent for exploring & transforming data • No schema needed • Data is always visible Spreadsheets Databases • Edit one cell at a time • Excellent for data entry, functions, calculations • No schema needed • Data is always visible • Schema and scripting language needed for editing • Data is mostly out of site unless programming is used to run queries or build views
Getting help The Open. Refine wiki is housed on Git. Hub: https: //github. com/Open. Refine/wiki -includes installation instructions, documentation, tutorials, recipes, etc. Using Open. Refine by Ruben Verborgh and Max De Wilde, 2013
Getting started (on Windows) • Download the. zip file • Extract to a folder of your choosing • Click the. exe file to run • The Command window opens and will run in the background [Ctrl-C in this window safely exits Open. Refine]
Runs in your default browser http: //localhost: 3333 or http: //127. 0. 0. 1: 3333
Create project • Create a new project, • Open an existing one , • Or import from another Open. Refine instance. Supported file formats include: TSV, CSV, *SV, Excel (. xls and. xlsx), JSON, XML, RDF as XML, and Google docs
Create project Name the project Edit import options if necessary; options vary by file type.
Basic navigation 1 3 4 2
The “All” column Contains some features that let you perform operations on all columns at once: - reorder - remove - collapse or expand View – Collapse/Expand columns Edit columns – Re-order/remove columns
The other columns Most operations in Open. Refine act on a single column, and are initiated from that column’s menu. The “Edit column” dropdown menu contains options to rename or remove the column, and provides limited options for moving the column (to the beginning, end, or one over in either direction). The “View” dropdown provides additional collapsing options
Project history: Undo / Redo • undo some (or all) of your project • extract/save parts of your project history • apply (import) steps from another project
Export options Export Menu
Explore your data Open. Refine offers multiple ways to facet your data: – text – number – timeline – blank – error – and more! Demo: http: //www. screencast. com/t/h 1 v 130 lt. Dl Image source: International Space Station Above Earth, by NASA, https: //www. flickr. com/photos/nasamarshall/9070896398/, (CC BY-NC 2. 0)
Filtering Text filtering matches cells that contain a string or regular expression.
Sorting in Open. Refine is somewhat special… Demo: http: //www. screencast. com /t/m. EUVANx. Yz Image source: Lego Sorting, by jwhittenburg, https: //www. flickr. com/photos/jaydubya_rulez/207321782/, (CC-BY-NC-ND 2. 0).
Blank down / Fill down
Rows vs. records
Clean & transform General transformation tips: • Think in patterns – what are the common characteristics of the cells/rows/columns you want to change • Use facets and filters to isolate – then use a single command to change the set
Common transforms
Splitting cells & transposing Problem: You used the TITLE field from the BIB_TEXT table in your Voyager Access query; now you want to separate the title and author information. Solution: Use some of the Edit Cells and Transpose options. original cell format after splitting multivalue cells http: //www. screencast. com/t/Ax. X 3 p. Og 6 U after transposing cells in rows into columns
Splitting columns ILLiad LDAP conversion project – deriving campus IDs from patron email addresses: ILLiad user data Edit column menu
Splitting columns
Clustering is magical Publisher data in Voyager can be messy. This video shows how clustering can be used to merge variations of the same publisher together. http: //screencast. com/t/d. MYQsus. Xj
GREL ^ is the symbol for starts with
Transforming with GREL Menu Option Result Edit cells: Transform… The regular expression transforms the cells in active column Edit column: Add column based on this column The regular expression is run against the active column, but creates a new column
GREL
GREL: replacing The first set of ““ contains the string to replace; the 2 nd set contains what to replace it with. The preview shows that the “c” and the “. ” have been replaced with “nothing. ” This is two expressions chained together, not one. They are combined with the period that precedes the 2 nd “replace. ”
History and favorites The History tab stores expressions used previously in current AND other projects. The Starred tab stores those you have marked as favorites.
A couple favorites The cell. cross function pulls data from one project into another (based on a matching column – ISSN, Bib. ID, title, etc. ): syntax: cell. cross("Name of the source project", "name of the reference column"). cells["Name of the column you want to import"]. value[0] example – you’re working with the title list for your Wiley package renewal - from the column containing the ISSN info, add a new column using the following expression – it matches against the ISSN column in the Wiley COUNTER report, and pulls in the fulltext downloads: cell. cross("Wiley 2013 JR 1", “Print ISSN"). cells[“Reporting Period Total"]. value[0] Transform display call numbers into a normalized call numbers: 1) Remove periods value. replace(". ", "") 2) Separate letter groups followed by numbers (with a space) value. replace(/(p{Is. Alphabetic})(? =d)/, '$1 ') 3) Separate number groups followed by letters value. replace(/(d)(? =[A-Z])/, '$1 ')
Extend & reconcile Image source: Map of the Open. Refine Ecosystem, by Martin Magdinier, @magdmartin, http: //openrefine. org/2015/01/26/Mapping-Open. Refine-ecosystem. html
Questions? Additional image credits: Broom icon, By Alberto Guerra Quintanilla, from the Noun Project, https: //thenounproject. com/term/broom/30688/, (CC BY 3. 0 US). Bucket icon, By Alberto Guerra Quintanilla, from the Noun Project, https: //thenounproject. com/term/bucket/30690/, (CC BY 3. 0 US). Bullfighting icon, By Paulo Volkova, from the Noun Project, https: //thenounproject. com/term/bullfighting/3835/, public domain. Magic-Wand icon, By Mister Pixel, from the Noun Project, https: //thenounproject. com/term/magic-wand/34626/, (CC BY 3. 0 US).
- Groups google com forum
- 영국 beis
- "wicked problem"
- Refine reduce reuse recycle recovery retrieve energy
- Seamus heaney storm on the island
- To tame the perilous skies
- Refine argument
- Defining and refining the problem ppt
- Tame events
- 1 peter 4:12-19 esv
- Mark tame
- Kind hearted synonyms
- Tame
- Wind dives and strafes invisibly
- Contoh refine teknologi ramah lingkungan
- Single user and multiple user operating system
- Single user and multi user operating system
- Open your eyes free your mind
- Give us your hungry your tired your poor
- Gil kalai quantum computing
- Gil kalai quantum computing
- Gil kirkpatrick
- Robledo lima gil
- Caroline prenat hugo
- Ted bundy character
- Humanismo gil vicente
- Imagen de gil gonzalez davila
- City of god gil cuadros
- Robledo lima gil
- Gil kirkpatrick
- Rizal most profitable business
- Fiona gil
- Vida e obra de gil vicente
- Nelson javier duenas gil
- Personagens tipo da farsa de ines pereira
- Resumo da obra o velho da horta