6 Spatial Mining Spatial Data and Structures Images
















- Slides: 16

6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms Spring 2003 Data Mining by H. Liu, ASU 1

Definitions • Spatial data is about instances located in a physical space • Spatial data has location or geo-referenced features • Some of these features are: – Address, latitude/longitude (explicit) – Location-based partitions in databases (implicit) Spring 2003 Data Mining by H. Liu, ASU 2

Applications and Problems • Geographic information systems (GIS) store information related to geographic locations on Earth – Weather, climate monitoring, community infrastructure needs, disaster management, and hazardous waste • Homeland security issues such as prediction of unexpected events and planning of evacuation • Remote sensing and image classification • Biomedical applications include medical imaging and illness diagnosis Spring 2003 Data Mining by H. Liu, ASU 3

Use of Spatial Data • Map overlay – merging disparate data – Different views of the same area: (Level 1) streets, power lines, phone lines, sewer lines, (Level 2) actual elevations, building locations, and rivers • Spatial selection – find all houses near ASU • Spatial join – nearest for points, intersection for areas • Other basic spatial operations – Region/range query for objects intersecting a region – Nearest neighbor query for objects closest to a given place – Distance scan asking for objects within a certain radius Spring 2003 Data Mining by H. Liu, ASU 4

Spatial Data Structures • Minimum bounding rectangles (MBR) • Different tree structures – Quad tree – R-Tree – kd-Tree • Image databases Spring 2003 Data Mining by H. Liu, ASU 5

MBR • Representing a spatial object by the smallest rectangle [(x 1, y 1), (x 2, y 2)] or rectangles (x 2, y 2) (x 1, y 1) Spring 2003 Data Mining by H. Liu, ASU 6

Tree Structures • Quad Tree: every four quadrants in one layer forms a parent quadrant in an upper layer – An example Spring 2003 Data Mining by H. Liu, ASU 7

R-Tree • Indexing MBRs in a tree – An R-tree order of m has at most m entries in one node – An example (order of 3) R 6 R 8 R 1 R 6 R 7 R 2 R 3 Spring 2003 R 4 R 5 R 1 R 2 Data Mining by H. Liu, ASU R 3 R 4 R 5 8

kd-Tree • Indexing multi-dimensional data, one dimension for a level in a tree – An example Spring 2003 Data Mining by H. Liu, ASU 9

Common Tasks dealing with Spatial Data • Data focusing – Spatial queries – Identifying interesting parts in spatial data – Progress refinement can be applied in a tree structure • Feature extraction – Extracting important/relevant features for an application • Classification or others – Using training data to create classifiers – Many mining algorithms can be used • Classification, clustering, associations Spring 2003 Data Mining by H. Liu, ASU 10

Spatial Mining Tasks • Spatial classification • Spatial clustering • Spatial association rules Spring 2003 Data Mining by H. Liu, ASU 11

Spatial Classification • Use spatial information at different (coarse/fine) levels (in different indexing trees) for data focusing • Determine relevant spatial or non-spatial features • Perform normal supervised learning algorithms – e. g. , Decision trees, NBC, etc. Spring 2003 Data Mining by H. Liu, ASU 12

Spatial Clustering • Use tree structures to index spatial data • Cluster locally – DBSCAN: R-tree – CLIQUE: Grid or Quad tree Spring 2003 Data Mining by H. Liu, ASU 13

Spatial Association Rules • Spatial objects are of major interest, not transactions • A B – A, B can be either spatial or non-spatial (3 combinations) – What is the fourth combination? • Association rules can be found w. r. t. the 3 types Spring 2003 Data Mining by H. Liu, ASU 14

Summary • Spatial data can contain both spatial and nonspatial features. • When spatial information becomes dominant interest, spatial data mining should be applied. • Spatial data structures can facilitate spatial mining. • Standard data mining algorithms can be modified for spatial data mining, with a substantial part of preprocessing to take into account of spatial information. Spring 2003 Data Mining by H. Liu, ASU 15

Bibliography • M. H. Dunham. Data Mining – Introductory and Advanced Topics. Prentice Hall. 2003. • R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification, 2 nd edition. Wiley-Interscience. • J. Han and M. Kamber. Data Mining – Concepts and Techniques. 2001. Morgan Kaufmann. Spring 2003 Data Mining by H. Liu, ASU 16