Maps 2 Spatial Analysis Examples EPID 799 C

  • Slides: 36
Download presentation
Maps 2 Spatial Analysis & Examples EPID 799 C Fall 2018

Maps 2 Spatial Analysis & Examples EPID 799 C Fall 2018

Today: Maps in R • Quick Review • Getting spatial data • Read /

Today: Maps in R • Quick Review • Getting spatial data • Read / write • Merge • Vis packages • Data structures (sp/sf) • Basic spatial operations • Recent map projects

Review from last time Types: Point, Line, Poly, Raster Shapefile structure QGIS Basics R

Review from last time Types: Point, Line, Poly, Raster Shapefile structure QGIS Basics R objects: sp, sf

Getting spatial data • Tiger Files: https: //www. census. gov/geo/maps-data/tiger -line. html • NC

Getting spatial data • Tiger Files: https: //www. census. gov/geo/maps-data/tiger -line. html • NC One Map http: //data. nconemap. gov/geoportal/catalog/main /home. page • Geocode your own

Remember our new friend: sf! • Simple features (sf) spec – just a dataframe!

Remember our new friend: sf! • Simple features (sf) spec – just a dataframe! • The spatial information is stored as another “variable’ • Not quite as efficient in some ways, but much easier to work with. Can overload dplyr functions in slick ways, like *_join(), group_by(), summarize(). • See cran page and many vignettes on the github. ^Seriously the vignettes are fantastic.

Read/write sp sf • Read. OGR • st_read • write. OGR • st_write

Read/write sp sf • Read. OGR • st_read • write. OGR • st_write

Merge • Exactly the same as spatial! (Thanks, overloading) new_df = merge(df 1, df

Merge • Exactly the same as spatial! (Thanks, overloading) new_df = merge(df 1, df 2) # by=… and all. x/all. y= parameters Important! 1. Do not merge the data by itself. I have done this. 2. Also: joining data can be a pain

Viz Packages, covered previously • • base: : plot() sp: : spplot() ggplot tmap

Viz Packages, covered previously • • base: : plot() sp: : spplot() ggplot tmap

Basic Spatial Operations in sp • sp. Transform() http: //www. epsg. org/ , cheat

Basic Spatial Operations in sp • sp. Transform() http: //www. epsg. org/ , cheat sheet, spatial reference list • • • g. Centroid() g. Relates() DE-9 IM matrix g. Touches() ^simple version of above aggregate() (spatial) g. Buffer() over() and %over% workhorse. (SE example here) g. Dist() sp. Sample … and many more

Basic Spatial Operations in sf (reminscient!) • • • st_transform st_centroid() st_relate() DE-9 IM

Basic Spatial Operations in sf (reminscient!) • • • st_transform st_centroid() st_relate() DE-9 IM matrix st_touches() (intersects, disjoint, crosses, within, etc. ) group_by and summarise st_buffer (boundary, union_cascaded, simplify…) st_distance st_sample … and many more

Let’s Try! • Spatial unioning… and group_by! • Spatial transform, centroids, buffers, touches. .

Let’s Try! • Spatial unioning… and group_by! • Spatial transform, centroids, buffers, touches. . etc. • Spatial subsetting… is so slick • Spatial distance • Spatial over / referencing • Spatial autocorrelation …Not covering more complex spatial models, finding clusters, space-time, rasters…

How to use these tools? Spatial analysis can help in many different ways

How to use these tools? Spatial analysis can help in many different ways

Examples & Patterns Long • Industrial Hog Operations (IHOs) - Title VI complaint •

Examples & Patterns Long • Industrial Hog Operations (IHOs) - Title VI complaint • Alcohol exposure index • Death by police (WIP) Short • Cumulative exposures • Flooding • Dissertation / Traffic Stops • Tobacco scraping / machine learning

IHOs

IHOs

IHOs

IHOs

IHOs: R ~ 800 lines of (very inefficient) code. ############################################ # General Structure #

IHOs: R ~ 800 lines of (very inefficient) code. ############################################ # General Structure # 1. Prep the IHO dataset: filtering and merging to unique permits # 2. Prep geography: Import NC counties, create point/buffer (3 mi) shapefiles for IHOs. # 3. Exclusions: Create exclusion shapes for "no-touch" counties and top 5 cities. Map. # 4. Prep Demographics: Import 2010 census data and shapefiles at block level. Merge. # 5. Spatial punch: Strip exclusion blocks from shapefile/data frame to get study area. # 6. Spatial aggregation: Merge 3 mi buffer exposure data into blocks. Export shapefile. # 7. Regressions. # 8. Datapoints and graphics. ############################################ Patterns: Duplicate an analysis where you know the answer! Rosetta Stones are great ways to learn. Repeated analysis over and over. Brought 4 hours for every change to one button click. For court use, *everything* is documented. Still using it – request for original data and scripts this week. Text to speech is excessive, but neat.

Alcohol Exposure Index Left. On-premises alcohol outlets and cluster zones in Atlanta, GA, 1997–

Alcohol Exposure Index Left. On-premises alcohol outlets and cluster zones in Atlanta, GA, 1997– 2007. Above Example of spatial index construct. The alcohol retailer exposure index for a region (e. g. cluster or city-wide) is built by the sum of the inverse distances from each population (e. g. census block) cluster centroid to the nearest seven retailers. In this example, each of the nearest seven retailers are ½ mile from the centroid of the region , making, the spatial index for this region 14(14 = 7 x inverse(1/2)). . All individuals are ascribed this exposure, and then populations within the region are summed to obtain the total or average alcohol retailer exposure by demographics or alcohol retailer type.

Alcohol Exposure Index Left. Map of On- and Off-Premises Alcohol Retailer Locations and Clusters

Alcohol Exposure Index Left. Map of On- and Off-Premises Alcohol Retailer Locations and Clusters in Durham County, NC. Separate on- and off-premises clusters were defined as five or more alcohol retailers within 0. 15 miles. There were no clusters located farther north or east of this window in Durham County. Above: Number of Clusters vs. Buffer Size of Onand Off-Premises Retailers in Durham County, NC. For example, note how, for clusters of minimum size 10, buffer counts begin to collapse after using buffers greater than 0. 25 miles to nearest alcohol retailers.

Alcohol Exposure Index Right. Demographics of Durham County Residents Living within On- and Off-Premises

Alcohol Exposure Index Right. Demographics of Durham County Residents Living within On- and Off-Premises Alcohol Clusters. Hispanic and Black residents were more likely to live within an off-premises alcohol retailer cluster than White non-Hispanic residents. Demographic data from 2010 US Census

Alcohol Exposure Index Above. Home Owners’ Loan Corporation (Redlining) Map of Durham, NC (1933)

Alcohol Exposure Index Above. Home Owners’ Loan Corporation (Redlining) Map of Durham, NC (1933) and Onand Off-Premises Alcohol Retailer Cluster Locations (2007). Note the largest off-premises alcohol retailer cluster is over the largest “Grade D” neighborhoods.

Alcohol Exposure Index: R (WIP) Load libraries & read files Recode: match constructs (race-eth,

Alcohol Exposure Index: R (WIP) Load libraries & read files Recode: match constructs (race-eth, dates, etc. ) Get unique locations (for spatial matching) Function: make distance matrices, returns list of matrices Function: report matches Run tests & output Machine learning tests Patterns: Save points – output results and read them right back in. Higher level functions & process wrappers Retailing a color palette in maps and graphs - nice

Police Killings • Death by “legal intervention” • Linking project – Violent Death Reporting

Police Killings • Death by “legal intervention” • Linking project – Violent Death Reporting System (VSRS) to crowd-sourced data • What are the differences in community ascription of death by police and formal VDRS coding?

Police Killings: Distance Matrices Simpler but non-trivial linking methods use approximate text distance linking

Police Killings: Distance Matrices Simpler but non-trivial linking methods use approximate text distance linking (Levinschtein distance or others) on a fully concatenated string. Example: A=“John Q Smith 2017/02/12 White Raleigh NC” B= “John P Smith 2016/12/21 White Raleigh NC” Number of substitutions, additions, deletions to get from A to B = 5 But this doesn’t take advantage of the content (e. g. date distance or place distance).

Police Killings: Distance Matrices First build individual content-specific distance matrices for each element …

Police Killings: Distance Matrices First build individual content-specific distance matrices for each element … (example distance range in parentheses) Name: Race-eth: Date: City: State: text distance (0 ->32) binary 0 -1 days different (0 -707) text distance (0 -19) binary 0 -1

Police Killings: Distance Matrices … Then collapsing them into a single n-dimensional distance by

Police Killings: Distance Matrices … Then collapsing them into a single n-dimensional distance by any of a number of methods. These aggregate link indices performs well by themselves. Name Race-eth Unweighted sum Date City scaled sum (0 -1) State Normalized sum (~-3, ~3)

Police Killings: Exploratory Tree Models Predictive trees can also be used for categorization. They

Police Killings: Exploratory Tree Models Predictive trees can also be used for categorization. They benefit from 1. Maintain separate distance matrix information as decision nodes 2. Being able to “reuse” covariates if useful 3. Do not require single a priori or parameterized generalized linear relationships

Police Killings: Exploratory Tree Models With violent deaths / deaths by police being relatively

Police Killings: Exploratory Tree Models With violent deaths / deaths by police being relatively rare at the local level, even without decedent name in the tree model, date distance (in days) and the approximate text distance in the city name, by themselves, correctly categorize 99% of links between the Mapping Police Violence and The Counted datasets. Name is more useful when using the entire violent death dataset. Treating dates numerically instead of as strings in a concatenated ID may have application to other death linking projects.

Police Killings: Future Improvements • NVDRS: Waiting for national VDRS data to test on

Police Killings: Future Improvements • NVDRS: Waiting for national VDRS data to test on a “supervised” (hand linked) subset • Stabilization if single aggregate index, consider stabilizing (e. g. dates >30 are all 30, long name distances are cut off) • Package Helpers: Record. Linkage package in R uses regressive trees. Need to efficiently bin as well. • Spatial Distance Matrix: Considering creating a lookup table of geocoded place distances on city-state to benefit from spatial distance, e. g. 1) Chapel Hill, TN to Chapel Hill, NC text distance = 2 , spatial distance = 226 miles 2) Chapel Hill to Carrboro text distance = 9, spatial distance = 0. 5 miles

Police Killings: R (WIP) Load libraries & read files Recode: match constructs (race-eth, dates,

Police Killings: R (WIP) Load libraries & read files Recode: match constructs (race-eth, dates, etc. ) Get unique locations (for spatial matching) Function: make distance matrices, returns list of matrices Function: report matches Run tests & output Machine learning tests Patterns: Implement what you intuit and test it on a data subset Packages exist that may do it better (Record. Linkage) Trees, again, very doable A little spatial / text-mining goes a long way

Cumulative Exposures (CAFOs to start) Maps of cumulative density All CAFOs (additive) Patterns: Extend

Cumulative Exposures (CAFOs to start) Maps of cumulative density All CAFOs (additive) Patterns: Extend / make generic previous analysis. Identify which structures hold results

Flooding / Cumulative Exposure Two counties to start Patterns: Collapse / rbind layers.

Flooding / Cumulative Exposure Two counties to start Patterns: Collapse / rbind layers.

Race-Ethnic Disparities in Law Enforcement Traffic Stops Bigger than we have time for, but:

Race-Ethnic Disparities in Law Enforcement Traffic Stops Bigger than we have time for, but: 1. Residential-based traffic stop rates tend to underestimate disparities 2. Fayetteville decreased racial disparities by prioritizing different stop types, though could do more • • Spatial data very helpful: counting, “punching”, etc. Reports for NAACP, SCSJ Quick data processing Patterns: Spatio-temporal mapping Simulation Spatial work QGIS

Tobacco R consulting grew into a job • Cleaning scripts • 100 s of

Tobacco R consulting grew into a job • Cleaning scripts • 100 s of files in various formats. R can iteratively loop through them with list. files() and use a header file • Reports galore • Machine learning categorization prototypes • Nuanced data cleaning (messy spatial data) • Modeling & map-building

Tobacco – Data+ Web-scraping Text-mining Machine learning Patterns: Basic tree-ML isn’t so hard! Just

Tobacco – Data+ Web-scraping Text-mining Machine learning Patterns: Basic tree-ML isn’t so hard! Just try. A little text-mining goes a long way / learn dplyr!

Tobacco – Data+ Patterns: Epi gives concepts applicable / extensible elsewhere – confusion matrix

Tobacco – Data+ Patterns: Epi gives concepts applicable / extensible elsewhere – confusion matrix (Sn/Sp) Trees are cool and worth learning and easy to implement in R.

Questions / Thoughts?

Questions / Thoughts?