Chapter 5 Part A Spatial data exploration www

























![Spatial data exploration l Computing G(d) [computing F(d) is similar] l Find all the Spatial data exploration l Computing G(d) [computing F(d) is similar] l Find all the](https://slidetodoc.com/presentation_image_h2/c79c70514704b87ce800fa66957b66af/image-26.jpg)








- Slides: 34
Chapter 5 Part A: Spatial data exploration www. spatialanalysisonline. com
Spatial data exploration l Spatial analysis and data models (Anselin, 2002) Object Field GIS vector raster Spatial Data points, lines, polygons surfaces Location discrete continuous Observations process realisation sample Spatial Arrangement spatial weights distance function Statistical Analysis lattice geostatistics Prediction extrapolation interpolation Models lag and error Asymptotics expanding domain infill 3 rd edition www. spatialanalysisonline. com 2
Spatial data exploration l Sampling frameworks ¡ Pure random sampling ¡ Stratified random – by class/strata (proportionate, disproportionate) ¡ Randomised within defined grids ¡ Uniform with randomised offsets ¡ Sampling and declustering 3 rd edition www. spatialanalysisonline. com 3
Spatial data exploration l Sampling 3 rd edition frameworks – point sampling www. spatialanalysisonline. com 4
Spatial data exploration ¡ Sampling Grid generation - square grid within field boundaries 3 rd edition frameworks – within zones Grid generation (hexagonal) selection of 1 point per cell, random offset from centre www. spatialanalysisonline. com Selection of 5 random points per zone 5
Spatial data exploration A. 10% random sample from existing point set B. Stratified random selection, 30% of each stratum 800 radio-activity monitoring sites in Germany. Random sample of 80 (red/large dots) 200 radio-activity monitoring sites in Germany. Random sample of 30 (red/large dots)<100 units of radiation and 30 (crosses)>=100 units of radiation 3 rd edition www. spatialanalysisonline. com 6
Spatial data exploration l Random 3 rd edition points on a network www. spatialanalysisonline. com 7
Spatial data exploration l EDA, ESDA and ESTDA ¡ EDA – basic aims (after NIST) maximize insight into a data set l uncover underlying structure l extract important variables l detect outliers and anomalies l test underlying assumptions l develop parsimonious models l determine optimal factor settings l 3 rd edition www. spatialanalysisonline. com 8
Spatial data exploration l ESDA (see Geo. Da and STARS) ¡ Extending EDA ideas to the spatial domain (lattice/zone models) Brushing l Linking l Mapped histograms l Outlier mapping l Box plots l Conditional choropleth plots l Rate mapping l 3 rd edition www. spatialanalysisonline. com 9
Spatial data exploration l ESDA: 3 rd edition Brushing & linking www. spatialanalysisonline. com 10
Spatial data exploration l ESDA: 3 rd edition Histogram linkage www. spatialanalysisonline. com 11
Spatial data exploration l ESDA: 3 rd edition Parallel coordinate plot & star plot www. spatialanalysisonline. com 12
Spatial data exploration l ESDA: 3 rd edition Mapped box plots www. spatialanalysisonline. com 13
Spatial data exploration l ESDA: 3 rd edition Conditional choropleth mapping www. spatialanalysisonline. com 14
Spatial data exploration l ESDA: Mapped point data A. Variable point size 3 rd edition B. Variable colour C. Semivariogram pairs www. spatialanalysisonline. com D. Voronoi analysis 15
Spatial data exploration l ESDA: Trend analysis (continuous spatial data) 3 rd edition www. spatialanalysisonline. com 16
Spatial data exploration l ESDA: Cluster hunting – GAM/K (steps) 1. Read data for the population at risk 2. Identify the MBR containing the data, identify starting circle radius, and degree of overlap 3. Generate a grid covering the MBR 4. For each grid-intersection generate a circle of radius r 5. Retrieve two counts for the population at risk and the variable of interest 6. Apply some “significance” test procedure 7. Keep the result if significant 8. Repeat Steps 5 to 7 until all circles have been processed 9. Increase circle radius by dr and return to Step 3 else go to Step 10 10. Create a smoothed density surface of excess incidence for the significant circles 11. Map this surface and inspect the results 3 rd edition www. spatialanalysisonline. com 17
Spatial data exploration l Grid-based statistics ¡ Univariate analysis of attribute data (nonspatial metrics) ¡ Cross-classification and cross-tab analyses ¡ Spatial pattern analysis for grid data (including Landscape metrics) l Patch metrics; Class-level metrics; Landscapelevel metrics ¡ Quadrat analysis ¡ Multi-grid regression analysis 3 rd edition www. spatialanalysisonline. com 18
Spatial data exploration l Grid-based ¡ Landscape l statistics metrics Non-spatial • Proportional abundance; Richness; Evenness; Diversity l Spatial • Patch size distribution and density; Patch shape complexity; Core Area; Isolation/Proximity; Contrast; Dispersion; Contagion and Interspersion; Subdivision; Connectivity 3 rd edition www. spatialanalysisonline. com 19
Spatial data exploration l Point (event) based statistics ¡ Typically analysis of point-pair distances ¡ Points vs events ¡ Distance metrics: Euclidean, spherical, Lp or network ¡ Weighted or unweighted events ¡ Events, NOT computed points (e. g. centroids) ¡ Classical statistical models vs Monte Carlo and other computational methods 3 rd edition www. spatialanalysisonline. com 20
Spatial data exploration l Point (event) based statistics ¡ Basic Nearest neighbour (NN) model Input coordinates of all points l Compute (symmetric) distances matrix D l Sort the distances to identify the 1 st, 2 nd, . . . kth nearest values l Compute the mean of the observed 1 st, 2 nd, . . . kth nearest values l Compare this mean with the expected mean under Complete Spatial Randomness (CSR or Poisson) model l 3 rd edition www. spatialanalysisonline. com 21
Spatial data exploration l Point 3 rd edition (event) based statistics – NN model www. spatialanalysisonline. com 22
Spatial data exploration l Point (event) based statistics – NN model ¡ Mean NN distance: ¡ Variance: ¡ NN Index (Ratio): ¡ Z-transform: 3 rd edition www. spatialanalysisonline. com 23
Spatial data exploration l Point (event) based statistics ¡ Issues Are observations n discrete points? l Sample size (esp. for kth order NN, k>1) l Model requires density estimation, m l Boundary definition problems (density and edge effects) – affects all methods l NN reflexivity of point sets l Limited use of frequency distribution l Validity of Poisson model vs alternative models l 3 rd edition www. spatialanalysisonline. com 24
Spatial data exploration l Frequency distribution of nearest neighbour distances, i. e. ¡ ¡ 3 rd edition The frequency of NN distances in distance bands, say 01 km, 1 -2 kms, etc The cumulative frequency distribution is usually denoted l G(d) = #(di < r)/n l F(d) = #(di < r)/m where di are the NN distances and n is the number of measurements, or where m is the number of random points used in sampling www. spatialanalysisonline. com 25
Spatial data exploration l Computing G(d) [computing F(d) is similar] l Find all the NN distances Rank them and form the cumulative frequency distribution Compare to expected cumulative frequency distribution: l l l 3 rd edition Similar in concept to K-S test with quadrat model, but compute the critical values by simulation rather than table lookup www. spatialanalysisonline. com 26
Spatial data exploration l Point (event) based statistics – clustering (ESDA) ¡ ¡ ¡ 3 rd edition Is the observed clustering due to natural background variation in the population from which the events arise? Over what spatial scales does clustering occur? Are clusters a reflection of regional variations in underlying variables? Are clusters associated with some feature of interest, such as a refinery, waste disposal site or nuclear plant? Are clusters simply spatial or are they spatio-temporal? www. spatialanalysisonline. com 27
Spatial data exploration l Point (event) based statistics – clustering ¡ kth order NN analysis ¡ Cumulative distance frequency distribution, G(r) ¡ Ripley K (or L) function – single or dual pattern ¡ PCP ¡ Hot spot and cluster analysis methods 3 rd edition www. spatialanalysisonline. com 28
Spatial data exploration l Point (event) based statistics – Ripley K or L l l l Construct a circle, radius d, around each point (event), i Count the number of other events, labelled j, that fall inside this circle Repeat these first two stages for all points i, and then sum the results Increment d by a small fixed amount Repeat the computation, giving values of K(d) for a set of distances, d Adjust to provide ‘normalised measure’ L: 3 rd edition www. spatialanalysisonline. com 29
Spatial data exploration l Point (event) based statistics – Ripley K 3 rd edition www. spatialanalysisonline. com 30
Spatial data exploration l Point (event) based statistics – comments ¡ ¡ CSR vs PCP vs other models Data: location, time, attributes, error, duplicates l ¡ ¡ ¡ 3 rd edition Duplicates: deliberate rounding, data resolution, genuine duplicate locations, agreed surrogate locations, deliberate data modification Multi-approach analysis is beneficial Methods: choice of methods and parameters Other factors: borders, areas, metrics, background variation, temporal variation, non-spatial factors Rare events and small samples Process-pattern vs cause-effect ESDA in most instances www. spatialanalysisonline. com 31
Spatial data exploration l Hot spot and cluster analysis – questions ¡ ¡ ¡ 3 rd edition where are the main (most intensive) clusters located? are clusters distinct or do they merge into one another? are clusters associated with some known background variable? is there a common size to clusters or are they variable in size? do clusters themselves cluster into higher order groupings? if comparable data are mapped over time, do the clusters remain stable or do they move and/or disappear? www. spatialanalysisonline. com 32
Spatial data exploration l Hot spot (and cool-spot) analysis ¡ Visual inspection of mapped patterns ¡ Scale issues ¡ Proximal and duplicate points ¡ Point representation (size) ¡ Background variation/controls (risk adjustment) ¡ Weighted or unweighted ¡ Hierarchical or non-hierarchical ¡ Kernel & K-means methods 3 rd edition www. spatialanalysisonline. com 33
Spatial data exploration l Hot spot analysis – Hierarchical NN Cancer incidence data 3 rd edition 1 st and 2 nd order clusters www. spatialanalysisonline. com 34