Readings Levenstein Lyle data sharing Wickham tidy data
- Slides: 31
Readings: Levenstein & Lyle data sharing Wickham tidy data Git and Github tutorial (recommended) PSA data management policies (recommended) Announcement: No drill this week
UNIT 5 Data management
ORGANIZING PROJECTS
ORGANIZING PROJECTS Minimum codebook standards: 1. Variable name in dataset 2. Variable description (i. e. , how was it derived) 3. Permissible values of variable
ORGANIZING PROJECTS Minimum codebook standards: 1. Variable name in dataset 2. Variable description (i. e. , how was it derived) 3. Permissible values of variable
ORGANIZING PROJECTS Minimum codebook standards: 1. Variable name in dataset 2. Variable description (i. e. , how was it derived) 3. Permissible values of variable Codebook package: https: //psyarxiv. com/5 qc 6 h/
SECURING PROJECTS
Functions of security (1) Protect data from unauthorized access (2) Recover lost/corrupted files (3) Attain version control, or the ability to restore past versions of your writing/code/data Crude version control
Some more comprehensive solutions OSF Dropbox Github
USEFULLY STRUCTURING DATA
Thinking about dataset structure is most helpful with your processed datasets (but sometimes you’ll have some choice about structure for source & raw too) Datasets are useful when they are … (1) Easy to graph (2) Allow easy data exploration (3) Easy to clean (4) Easy to maintain if errors/corrections needed
TIDY DATA Data consists of values measured on an observational unit Each value belongs to: (1) a variable, or a specific type of measurement (2) a case (Wickham: observation), which contains all variables for an observational unit Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 1: Values used as column labels Fix:
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 1: Values used as column labels Fix:
Violation 1: Values used as column labels Pew data: Frequencies of occurrence of different income brackets by religion This is a variable
Violation 1: Values used as column labels Pew data: Frequencies of occurrence of different income brackets by religion gather(d, key=“income”, value=“freq”, -religion) (more not shown …)
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 2: Multiple variables in one column Fix:
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 2: Multiple variables in one column Fix:
Violation 2: Multiple variables in one column Tuberculosis data: Male and female cases across different age groups male vs female age brackets (more not shown …)
Violation 2: Multiple variables in one column Tuberculosis data: Male and female cases across different age groups (more not shown …) gather(d, key=“column”, value=“cases”, -c(country, year)) (more not shown …) mutate(d, sex =. . . , age =. . . ) (more not shown …)
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 3: Variables stored in rows Fix:
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 3: Variables stored in rows Fix:
Violation 3: Variables stored in rows Weather data: Maximum and minimum temperatures at different dates and weather stations Two separate variables: (1) Minimum temperature (2) Maximum temperature (more not shown …)
Violation 3: Variables stored in rows Weather data: Maximum and minimum temperatures at different dates and weather stations (more not shown …) spread(d, key=“element”, value=“value”) (more not shown …)
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 4: Two units in one table Fix:
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 4: Two units in one table Fix:
Violation 4: Two units in one table Billboard data: Weekly rankings of hit songs Ranking variables Track variables (more not shown …)
Violation 4: Two units in one table Billboard data: Weekly rankings of hit songs (more not shown …) unique(d[, c(“id”, track_vars)] unique(d[, c(“id”, rank_vars)] (more not shown …)
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 5: One unit in multiple tables Fix:
Data are tidy when: (1) Each variable has its own column (2) Each case has its own row (3) Each observational unit has its own table Violation 5: One unit in multiple tables Fix:
Summary You should make codebooks for your datasets. At a minimum, these should contain: 1. Variable name in dataset 2. Variable description (i. e. , how was it derived) 3. Permissible values of variable Secure your projects using a method that allows for version control One useful structure for datasets is the tidy structure. A dataset is tidy when … 1. Each variable has its own column 2. Each case has its own row 3. Each observational unit has its own table
- Wickham tidy data
- Why did mr darcy pay for lydia's wedding
- Annüler
- Performing financial analysis
- Wickham
- Pride and prejudice chapter 59
- Alpha kappa alpha graduate chapter letter of invitation
- Lyle alzado cause of death
- M. lyle spencer
- James lyle md
- Tyte&lyle
- James lyle nist
- James lyle johnson
- Lyle c may death row
- Tidy data principles
- Intensive reading characteristics
- How to take readings in theodolite
- Vernier caliper experiment
- Keratometer readings
- Language awareness readings for college writers
- Confined space gas limits
- Rad 57 co readings
- Abg levels
- Reading a graduated cylinder
- Eb = p x a x t
- Responsive reading christmas
- Responsive reading thanksgiving
- Njcu map
- Ammeter reading
- Normal ecg readings
- The past of sweep
- Cry 3 hali