R Intermediate Workshop Sohee Kang Associate professor teaching

  • Slides: 32
Download presentation
R Intermediate Workshop Sohee Kang Associate professor, teaching stream, Ph. D soheekang@utsc. utoronto. ca

R Intermediate Workshop Sohee Kang Associate professor, teaching stream, Ph. D soheekang@utsc. utoronto. ca Centre for Teaching and Learning Department of Computer and Mathematical Sciences

Introductions • Link to slides: • Sohee Kang - Ph. D in Biostatistics -

Introductions • Link to slides: • Sohee Kang - Ph. D in Biostatistics - Use R to analyze data and teach • who are you? - What are you working on? - Have you used R before? - How do you plan to use R?

Workshop Objective & Topics • Objective: present a way of working with data in

Workshop Objective & Topics • Objective: present a way of working with data in R - Basics tools for Data Science • Topics - Data Manipulation: organize/transform/summarize/combine - Data Visualization: language for describing/creating plots

Why R? Many tools STATA, SPSS, Matlab, Excel… Advantages � powerful � up to

Why R? Many tools STATA, SPSS, Matlab, Excel… Advantages � powerful � up to date with latest algorithms � strong community of users � Preferred by statistics community � Easy to document/modify/reproduce/ share your work � it’s free Disadvantages � steep learning curve, but can be useful quickly � not pretty � can be memory intensive

R Resources • Learning R ü Hands-On Programming with R, by G. Grolemund; excellent

R Resources • Learning R ü Hands-On Programming with R, by G. Grolemund; excellent beginner introduction to R ü R for Data Science, by H. Wickham and G. Grolemund; what we'll cover today ü Advanced R, by H. Wickham; the gory details (for serious programmers) ü R Cheatsheets; for various tools/libraries

R Overview • R philosophy − Information is contained in objects (e. g. data,

R Overview • R philosophy − Information is contained in objects (e. g. data, variables, models, plots) − Operations are represented by functions (e. g. sort data, fit model, plot results) • R comes with standard functions, but can significantly expand its functionality using packages (a. k. a. libraries) − Packages are bundles of reusable code (functions & data) − Must be downloaded once w/ install. packages() and loaded at start of R session w/ library() install. packages("tidyverse") library(tidyverse) help(package = "tidyverse")

Tidy Data stored in tidy data-frame/table

Tidy Data stored in tidy data-frame/table

Workshop Data Toronto Dinesafe program − Every food-serving establishment receives 1 -3+ inspections/year −

Workshop Data Toronto Dinesafe program − Every food-serving establishment receives 1 -3+ inspections/year − Public Health Inspector assigns one of 3 types of notice: Available through City of Toronto's Open Data

Complete TASKS in PART 1: First look at data

Complete TASKS in PART 1: First look at data

Reshaping Data Tidying-up data w/ spread()/gather()

Reshaping Data Tidying-up data w/ spread()/gather()

Reshaping Data • Split/combine variables w/separate()/unite() 1: 3, • Sort data w/arrange()

Reshaping Data • Split/combine variables w/separate()/unite() 1: 3, • Sort data w/arrange()

Subsetting Data • Pick data frame obs. /variables (i. e. rows/columns)

Subsetting Data • Pick data frame obs. /variables (i. e. rows/columns)

Transforming Data • Create new variables and summaries

Transforming Data • Create new variables and summaries

Pipes • Pipe operator %>% passes object as function's (first) argument x %>% f(y)

Pipes • Pipe operator %>% passes object as function's (first) argument x %>% f(y) = f(x, y) or y %>% f(x, . ) = f(x, y) • Apply functions sequentially data %>% filter( ) %>% select( ) %>% summarize Identical but much easier to read than summarize( select( filter(data) )

Grouping • Apply summary functions to groups (i. e. subsets of data) X %>%

Grouping • Apply summary functions to groups (i. e. subsets of data) X %>% group_by(v 2) %>% summarise(M=mean(v 1)) Can group on multiple variables - Each summary function removes last group level

Complete TASKS in PART 2: Manipulating data

Complete TASKS in PART 2: Manipulating data

Combining Data • Joins merge data-frames by common values

Combining Data • Joins merge data-frames by common values

Combining Data • More Joins

Combining Data • More Joins

Combining Data • Filtering Joins

Combining Data • Filtering Joins

Combining Data • Set operations on rows (observations)

Combining Data • Set operations on rows (observations)

Combining Data • Attaching rows/columns

Combining Data • Attaching rows/columns

Complete TASKS in PART 3: Combining data

Complete TASKS in PART 3: Combining data

Data Visualization • • • Communicate information from data through graphs (plots, charts, maps,

Data Visualization • • • Communicate information from data through graphs (plots, charts, maps, etc. ) Need conventions for communicating graphical information, i. e. a Grammar of Graphics We will use the ggplot 2 package in R to think about, describe, and create

Graph Anatomy • Graphs are created from the same components: − Data − Geometric

Graph Anatomy • Graphs are created from the same components: − Data − Geometric objects (lines, points, text, etc) − Coordinate systems − Other annotations (labels, legends, etc) • Multiple geometric objects are overlayed on a single coordinate system to create a graph

Aesthetic Mappings • Geometric objects convey information through their aesthetics • Variable in the

Aesthetic Mappings • Geometric objects convey information through their aesthetics • Variable in the data can be mapped to one or more of these aesthetics • Most common aesthetic mappings ⁻ Location: x, y (coordinates) ⁻ Appearance: size, color, fill

Plotting in ggplot 2 • Using proper grammar w/ ggplot() + layers ggplot(data =

Plotting in ggplot 2 • Using proper grammar w/ ggplot() + layers ggplot(data = dinesafe) + geom_bar(aes(x=MINIMUM_INSPECTIONS_PERYEAR))

Data Tranformations • Data can be transformed for plotting through stat function ggplot(data =

Data Tranformations • Data can be transformed for plotting through stat function ggplot(data = dinesafe, aes(x=SEVERITY, y=AMOUNT_FINED)) + stat_summary(fun. y = "sum", geom="bar")

Faceting • Create grid of sub-plots, one for each level of a variable ggplot(data

Faceting • Create grid of sub-plots, one for each level of a variable ggplot(data = dinesafe, aes(x=SEVERITY, y=AMOUNT_FINED)) + stat_summary(fun. y = "sum", geom="bar") + facet_wrap( facets = ~MINIMUM_INSPECTIONS_PERYEAR)

Plot Adjustments • Other aspects for fine-tuning plots ⁻ Coordinates: cartesian, polar, flipped, maps

Plot Adjustments • Other aspects for fine-tuning plots ⁻ Coordinates: cartesian, polar, flipped, maps ⁻ Scales: control range of aesthetic values ⁻ Annotations: axis labels, legends ⁻ Positional Adjustments: arranging multiple geoms

Complete TASKS in PART 4: Data Visualisation

Complete TASKS in PART 4: Data Visualisation

Wrap-up • What you learned: - Organize, manipulate & visualise data in R •

Wrap-up • What you learned: - Organize, manipulate & visualise data in R • Follow-up: ⁻ Use recommended resources ⁻ Take a course (online or physical) ⁻ Practice R/Rstudio on your own • Next steps: ⁻ Perform basic statistical analyses ⁻ Write reproducible reports with Rmarkdown

Acknowledgements • Many thanks to the entire R community for making such an amazing

Acknowledgements • Many thanks to the entire R community for making such an amazing tool available and accessible to everyone • Special thanks to Hadley Wickham, for revolutionizing R • Thank you for you attention!