R Intermediate Workshop Sohee Kang Associate professor teaching
- Slides: 32
R Intermediate Workshop Sohee Kang Associate professor, teaching stream, Ph. D soheekang@utsc. utoronto. ca Centre for Teaching and Learning Department of Computer and Mathematical Sciences
Introductions • Link to slides: • Sohee Kang - Ph. D in Biostatistics - Use R to analyze data and teach • who are you? - What are you working on? - Have you used R before? - How do you plan to use R?
Workshop Objective & Topics • Objective: present a way of working with data in R - Basics tools for Data Science • Topics - Data Manipulation: organize/transform/summarize/combine - Data Visualization: language for describing/creating plots
Why R? Many tools STATA, SPSS, Matlab, Excel… Advantages � powerful � up to date with latest algorithms � strong community of users � Preferred by statistics community � Easy to document/modify/reproduce/ share your work � it’s free Disadvantages � steep learning curve, but can be useful quickly � not pretty � can be memory intensive
R Resources • Learning R ü Hands-On Programming with R, by G. Grolemund; excellent beginner introduction to R ü R for Data Science, by H. Wickham and G. Grolemund; what we'll cover today ü Advanced R, by H. Wickham; the gory details (for serious programmers) ü R Cheatsheets; for various tools/libraries
R Overview • R philosophy − Information is contained in objects (e. g. data, variables, models, plots) − Operations are represented by functions (e. g. sort data, fit model, plot results) • R comes with standard functions, but can significantly expand its functionality using packages (a. k. a. libraries) − Packages are bundles of reusable code (functions & data) − Must be downloaded once w/ install. packages() and loaded at start of R session w/ library() install. packages("tidyverse") library(tidyverse) help(package = "tidyverse")
Tidy Data stored in tidy data-frame/table
Workshop Data Toronto Dinesafe program − Every food-serving establishment receives 1 -3+ inspections/year − Public Health Inspector assigns one of 3 types of notice: Available through City of Toronto's Open Data
Complete TASKS in PART 1: First look at data
Reshaping Data Tidying-up data w/ spread()/gather()
Reshaping Data • Split/combine variables w/separate()/unite() 1: 3, • Sort data w/arrange()
Subsetting Data • Pick data frame obs. /variables (i. e. rows/columns)
Transforming Data • Create new variables and summaries
Pipes • Pipe operator %>% passes object as function's (first) argument x %>% f(y) = f(x, y) or y %>% f(x, . ) = f(x, y) • Apply functions sequentially data %>% filter( ) %>% select( ) %>% summarize Identical but much easier to read than summarize( select( filter(data) )
Grouping • Apply summary functions to groups (i. e. subsets of data) X %>% group_by(v 2) %>% summarise(M=mean(v 1)) Can group on multiple variables - Each summary function removes last group level
Complete TASKS in PART 2: Manipulating data
Combining Data • Joins merge data-frames by common values
Combining Data • More Joins
Combining Data • Filtering Joins
Combining Data • Set operations on rows (observations)
Combining Data • Attaching rows/columns
Complete TASKS in PART 3: Combining data
Data Visualization • • • Communicate information from data through graphs (plots, charts, maps, etc. ) Need conventions for communicating graphical information, i. e. a Grammar of Graphics We will use the ggplot 2 package in R to think about, describe, and create
Graph Anatomy • Graphs are created from the same components: − Data − Geometric objects (lines, points, text, etc) − Coordinate systems − Other annotations (labels, legends, etc) • Multiple geometric objects are overlayed on a single coordinate system to create a graph
Aesthetic Mappings • Geometric objects convey information through their aesthetics • Variable in the data can be mapped to one or more of these aesthetics • Most common aesthetic mappings ⁻ Location: x, y (coordinates) ⁻ Appearance: size, color, fill
Plotting in ggplot 2 • Using proper grammar w/ ggplot() + layers ggplot(data = dinesafe) + geom_bar(aes(x=MINIMUM_INSPECTIONS_PERYEAR))
Data Tranformations • Data can be transformed for plotting through stat function ggplot(data = dinesafe, aes(x=SEVERITY, y=AMOUNT_FINED)) + stat_summary(fun. y = "sum", geom="bar")
Faceting • Create grid of sub-plots, one for each level of a variable ggplot(data = dinesafe, aes(x=SEVERITY, y=AMOUNT_FINED)) + stat_summary(fun. y = "sum", geom="bar") + facet_wrap( facets = ~MINIMUM_INSPECTIONS_PERYEAR)
Plot Adjustments • Other aspects for fine-tuning plots ⁻ Coordinates: cartesian, polar, flipped, maps ⁻ Scales: control range of aesthetic values ⁻ Annotations: axis labels, legends ⁻ Positional Adjustments: arranging multiple geoms
Complete TASKS in PART 4: Data Visualisation
Wrap-up • What you learned: - Organize, manipulate & visualise data in R • Follow-up: ⁻ Use recommended resources ⁻ Take a course (online or physical) ⁻ Practice R/Rstudio on your own • Next steps: ⁻ Perform basic statistical analyses ⁻ Write reproducible reports with Rmarkdown
Acknowledgements • Many thanks to the entire R community for making such an amazing tool available and accessible to everyone • Special thanks to Hadley Wickham, for revolutionizing R • Thank you for you attention!
- Promotion from assistant to associate professor
- Microteaching meaning
- Kosten tio hbo
- Memorandum of association
- Associate warden
- Incose certification
- My sis harbor college
- Child development program director permit
- Hea associate fellowship
- Laser alignment
- Associate consultant in capgemini
- Adobe certified associate certification programs
- Tecniche associate al pensiero computazionale
- Harper college associate degrees
- What is an associate director
- Marine corps league associate member uniform
- Associate degree startdag
- What is the advantage and disadvantage for direct mapping?
- Words or phrases that associate with critical reading
- Jeannie watkins
- Mhp associate partner gehalt
- Cincinnati state associate degrees
- What does this drawing indicate about the inca civilization
- Iter project associate
- Cipd self-assessment examples
- Cern pjas
- Associate consultant in capgemini
- Imeche associate membership
- برنامهxx
- Stratog online lectures
- Ruckus accredited partner
- Safety associate
- Lonestar nursing program