Overview of Mining Spatial Data 2152022 1 Mining

  • Slides: 26
Download presentation
Overview of Mining Spatial Data 2/15/2022 1

Overview of Mining Spatial Data 2/15/2022 1

Mining Spatial Data n Mining spatial databases and data warehouses 2/15/2022 n Spatial DBMS

Mining Spatial Data n Mining spatial databases and data warehouses 2/15/2022 n Spatial DBMS n Spatial Data Warehousing n Spatial Data Mining n Spatiotemporal Data Mining 2

Generalizing Spatial n Spatial data: n Generalize detailed geographic points into clustered regions, such

Generalizing Spatial n Spatial data: n Generalize detailed geographic points into clustered regions, such as business, residential, industrial, or agricultural areas, according to land usage n 2/15/2022 Require the merge of a set of geographic areas by spatial operations 3

What Is a Spatial Database System? n Geometric, geographic or spatial data: space-related data

What Is a Spatial Database System? n Geometric, geographic or spatial data: space-related data n Example: Geographic space (2 -D abstraction of earth surface), VLSI design, model of human brain, 3 -D space representing the arrangement of chains of protein molecule. n Spatial database system vs. image database systems. n Image database system: handling digital raster image (e. g. , satellite sensing, computer topography), may also contain techniques for object analysis and extraction from images and some spatial database functionality. n Spatial (geometric, geographic) database system: handling objects in space that have identity and well-defined extents, locations, and relationships. 2/15/2022 4

GIS (Geographic Information System) n n 2/15/2022 Analysis and visualization of geographic data Common

GIS (Geographic Information System) n n 2/15/2022 Analysis and visualization of geographic data Common analysis functions of GIS n Search (thematic search, search by region) n Location analysis (buffer, corridor, overlay) n Terrain analysis (slope/aspect, drainage network) n Flow analysis (connectivity, shortest path) n Distribution (nearest neighbor, proximity, change detection) n Spatial analysis/statistics (pattern, centrality, similarity, topology) n Measurements (distance, perimeter, shape, adjacency, direction) 5

Spatial DBMS (SDBMS) n SDBMS is a software system that supports spatial data models,

Spatial DBMS (SDBMS) n SDBMS is a software system that supports spatial data models, spatial ADTs, and a query language supporting them n supports spatial indexing, spatial operations efficiently, and query optimization n can work with an underlying DBMS Examples n Oracle Spatial Data Catridge n ESRI Spatial Data Engine 2/15/2022 6

Modeling Spatial Objects n What needs to be represented? n Two important alternative views

Modeling Spatial Objects n What needs to be represented? n Two important alternative views n Single objects: distinct entities arranged in space each of which has its own geometric description n n modeling cities, forests, rivers Spatially related collection of objects: describe space itself (about every point in space) n modeling land use, partition of a country into districts 2/15/2022 7

Modeling Single Objects: Point, Line and Region n Point: location only but not extent

Modeling Single Objects: Point, Line and Region n Point: location only but not extent n Line (or a curve usually represented by a polyline, a sequence of line segment): n moving through space, or connections in space (roads, rivers, cables, etc. ) n Region: n Something having extent in 2 D-space (country, lake, park). It may have a hole or consist of several disjoint pieces. 2/15/2022 8

Modeling Spatially Related Collection of Objects n Modeling spatially related collection of objects: plane

Modeling Spatially Related Collection of Objects n Modeling spatially related collection of objects: plane partitions and networks. n A partition: a set of region objects that are required to be disjoint (e. g. , a thematic map). There exist often pairs of objects with a common boundary (adjacency relationship). n A network: a graph embedded into the plane, consisting of a set of point objects, forming its nodes, and a set of line objects describing the geometry of the edges, e. g. , highways. rivers, power supply lines. n Other interested spatially related collection of objects: nested partitions, or a digital terrain (elevation) model. 2/15/2022 9

Spatial Data Types and Models Field-based model: raster data n framework: partitioning of space

Spatial Data Types and Models Field-based model: raster data n framework: partitioning of space n Object-based model: vector model n point, line, polygon, Objects, Attributes n 2/15/2022 10

Spatial Query Language Spatial query language n Spatial data types, e. g. point, line

Spatial Query Language Spatial query language n Spatial data types, e. g. point, line segment, polygon, … n Spatial operations, e. g. overlap, distance, nearest neighbor, … n Callable from a query language (e. g. SQL 3) of underlying DBMS SELECT S. name FROM Senator S WHERE S. district. Area() > 300 n Standards n SQL 3 (a. k. a. SQL 1999) is a standard for query languages n 2/15/2022 11

File Organization and Indices SDBMS: Dataset is in the secondary storage, e. g. disk

File Organization and Indices SDBMS: Dataset is in the secondary storage, e. g. disk n Space Filling Curves: An ordering on the locations in a multi-dimensional space n Linearize a multi-dimensional space n Helps search efficiently n 2/15/2022 12

Spatial Query Optimization A spatial operation can be processed using different strategies n Computation

Spatial Query Optimization A spatial operation can be processed using different strategies n Computation cost of each strategy depends on many parameters n n Query optimization is the process of n ordering operations in a query and n selecting efficient strategy for each operation n based on the details of a given dataset 2/15/2022 13

Spatial Data Warehousing n Spatial data warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data

Spatial Data Warehousing n Spatial data warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data repository n Spatial data integration: a big issue n Structure-specific formats (raster- vs. vector-based, OO vs. relational models, different storage and indexing, etc. ) n n Vendor-specific formats (ESRI, Map. Info, Integraph, IDRISI, etc. ) n Geo-specific formats (geographic vs. equal area projection, etc. ) Spatial data cube: multidimensional spatial database n 2/15/2022 Both dimensions and measures may contain spatial components 14

Dimensions and Measures in Spatial Data Warehouse n Dimensions n n n 2/15/2022 non-spatial

Dimensions and Measures in Spatial Data Warehouse n Dimensions n n n 2/15/2022 non-spatial n e. g. “ 25 -30 degrees” generalizes to“hot” (both are strings) spatial-to-nonspatial n e. g. Seattle generalizes to description “Pacific Northwest” (as a string) spatial-to-spatial n e. g. Seattle generalizes to Pacific Northwest (as a spatial region) n Measures n n numerical (e. g. monthly revenue of a region) n distributive (e. g. count, sum) n algebraic (e. g. average) n holistic (e. g. median, rank) spatial n collection of spatial pointers (e. g. pointers to all regions with temperature of 25 -30 degrees in July) 15

Spatial-to-Spatial Generalization n n 2/15/2022 Generalize detailed geographic points into clustered regions, such as

Spatial-to-Spatial Generalization n n 2/15/2022 Generalize detailed geographic points into clustered regions, such as businesses, residential, industrial, or agricultural areas, according to land usage Dissolve Requires the merging of a set of geographic areas by spatial operations Intersect Merge Clip Union 16

Example: British Columbia Weather Pattern Analysis n Input n n Output n n A

Example: British Columbia Weather Pattern Analysis n Input n n Output n n A map that reveals patterns: merged (similar) regions Goals n n A map with about 3, 000 weather probes scattered in B. C. Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using star schema Interactive analysis (drill-down, slice, dice, pivot, roll-up) Fast response time Minimizing storage space used Challenge n 2/15/2022 A merged region may contain hundreds of “primitive” regions (polygons) 17

Star Schema of the BC Weather Warehouse n Spatial data warehouse n Dimensions n

Star Schema of the BC Weather Warehouse n Spatial data warehouse n Dimensions n region_name n time n temperature n precipitation n Measurements n region_map n area n count Dimension table 2/15/2022 Fact table 18

Spatial Association Analysis n Spatial association rule: A B [s%, c%] n n n

Spatial Association Analysis n Spatial association rule: A B [s%, c%] n n n A and B are sets of spatial or non-spatial predicates n Topological relations: intersects, overlaps, disjoint, etc. n Spatial orientations: left_of, west_of, under, etc. n Distance information: close_to, within_distance, etc. s% is the support and c% is the confidence of the rule Examples 1) is_a(x, large_town) ^ intersect(x, highway) ® adjacent_to(x, water) [7%, 85%] 2) What kinds of objects are typically located close to golf courses? 2/15/2022 19

Spatial Autocorrelation n Spatial data tends to be highly self-correlated n Example: Neighborhood, Temperature

Spatial Autocorrelation n Spatial data tends to be highly self-correlated n Example: Neighborhood, Temperature n Items in a traditional data are independent of each other, whereas properties of locations in a map are often “auto-correlated”. n First law of geography: “Everything is related to everything, but nearby things are more related than distant things. ” 2/15/2022 20

Spatial Trend Analysis n Function n Detect changes and trends along a spatial dimension

Spatial Trend Analysis n Function n Detect changes and trends along a spatial dimension n Study the trend of non-spatial or spatial data changing with space n Application examples n Observe the trend of changes of the climate or vegetation with increasing distance from an ocean n Crime rate or unemployment rate change with regard to city geo-distribution 2/15/2022 21

Spatial Cluster Analysis n n Mining clusters—k-means, k-medoids, hierarchical, density-based, etc. Analysis of distinct

Spatial Cluster Analysis n n Mining clusters—k-means, k-medoids, hierarchical, density-based, etc. Analysis of distinct features of the clusters 2/15/2022 22

Constraints-Based Clustering n Constraints on individual objects n n Clustering parameters as constraints n

Constraints-Based Clustering n Constraints on individual objects n n Clustering parameters as constraints n n K-means, density-based: radius, min-# of points Constraints specified on clusters using SQL aggregates n n Simple selection of relevant objects before clustering Sum of the profits in each cluster > $1 million Constraints imposed by physical obstacles n 2/15/2022 Clustering with obstructed distance 23

Constrained Clustering: Planning ATM Locations C 2 e g d Bri C 3 C

Constrained Clustering: Planning ATM Locations C 2 e g d Bri C 3 C 1 River Mountain Spatial data with obstacles 2/15/2022 C 4 Clustering without taking obstacles into consideration 24

Mining Spatiotemporal Data n n Spatiotemporal data n Data has spatial extensions and changes

Mining Spatiotemporal Data n n Spatiotemporal data n Data has spatial extensions and changes with time n Ex: Forest fire, moving objects, hurricane & earthquakes Automatic anomaly detection in massive moving objects n Moving objects are ubiquitous: GPS, radar, etc. n Ex: Maritime vessel surveillance n Problem: Automatic anomaly detection 2/15/2022 25

2/15/2022 26

2/15/2022 26