Data Mining Principles and Algorithms Chapter 10 1

  • Slides: 11
Download presentation
Data Mining: Principles and Algorithms — Chapter 10. 1 — — Mining Object, Spatial,

Data Mining: Principles and Algorithms — Chapter 10. 1 — — Mining Object, Spatial, and Multimedia Data— ©Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www. cs. uiuc. edu/~hanj 2021/6/19 Data Mining: Principles and Algorithms 1

Mining Object, Spatial and Multi-Media Data n Mining object data sets n Mining spatial

Mining Object, Spatial and Multi-Media Data n Mining object data sets n Mining spatial databases and data warehouses n Spatial DBMS n Spatial Data Warehousing n Spatial Data Mining n Spatiotemporal Data Mining n Mining multimedia data n Summary 2021/6/19 Data Mining: Principles and Algorithms 2

Mining Complex Data Objects: Generalization of Structured Data n Set-valued attribute n Generalization of

Mining Complex Data Objects: Generalization of Structured Data n Set-valued attribute n Generalization of each value in the set into its corresponding higher-level concepts n Derivation of the general behavior of the set, such as the number of elements in the set, the types or value ranges in the set, or the weighted average for numerical data n E. g. , hobby = {tennis, hockey, chess, violin, PC_games} generalizes to {sports, music, e_games} n List-valued or a sequence-valued attribute n Same as set-valued attributes except that the order of the elements in the sequence should be observed in the generalization 2021/6/19 Data Mining: Principles and Algorithms 3

Generalizing Spatial and Multimedia Data n n Spatial data: n Generalize detailed geographic points

Generalizing Spatial and Multimedia Data n n Spatial data: n Generalize detailed geographic points into clustered regions, such as business, residential, industrial, or agricultural areas, according to land usage n Require the merge of a set of geographic areas by spatial operations Image data: n n n Extracted by aggregation and/or approximation Size, color, shape, texture, orientation, and relative positions and structures of the contained objects or regions in the image Music data: n n 2021/6/19 Summarize its melody: based on the approximate patterns that repeatedly occur in the segment Summarized its style: based on its tone, tempo, or the major musical instruments played Data Mining: Principles and Algorithms 4

Generalizing Object Data n n n Object identifier n generalize to the lowest level

Generalizing Object Data n n n Object identifier n generalize to the lowest level of class in the class/subclass hierarchies Class composition hierarchies n generalize only those closely related in semantics to the current one Construction and mining of object cubes n Extend the attribute-oriented induction method n Apply a sequence of class-based generalization operators on different attributes n Continue until getting a small number of generalized objects that can be summarized as a concise in high-level terms n Implementation n Examine each attribute, generalize it to simple-valued data n Construct a multidimensional data cube (object cube) n Problem: it is not always desirable to generalize a set of values to single-valued data 2021/6/19 Data Mining: Principles and Algorithms 5

Ex. : Plan Mining by Divide and Conquer n Plan: a sequence of actions

Ex. : Plan Mining by Divide and Conquer n Plan: a sequence of actions n n Plan mining: extraction of important or significant generalized (sequential) patterns from a planbase (a large collection of plans) n n n E. g. , Travel (flight): <traveler, departure, arrival, d-time, airline, price, seat> E. g. , Discover travel patterns in an air flight database, or find significant patterns from the sequences of actions in the repair of automobiles Method n Attribute-oriented induction on sequence data n n Divide & conquer: Mine characteristics for each subsequence n 2021/6/19 A generalized travel plan: <small-big*-small> E. g. , big*: same airline, small-big: nearby region Data Mining: Principles and Algorithms 6

A Travel Database for Plan Mining n Example: Mining a travel planbase Travel plan

A Travel Database for Plan Mining n Example: Mining a travel planbase Travel plan table Airport info table 2021/6/19 Data Mining: Principles and Algorithms 7

Multidimensional Analysis n A multi-D model for the planbase Strategy n n n 2021/6/19

Multidimensional Analysis n A multi-D model for the planbase Strategy n n n 2021/6/19 Generalize the planbase in different directions Look for sequential patterns in the generalized plans Derive high-level plans Data Mining: Principles and Algorithms 8

Multidimensional Generalization Multi-Dimensional generalization of the planbase Merging consecutive, identical actions in plans 2021/6/19

Multidimensional Generalization Multi-Dimensional generalization of the planbase Merging consecutive, identical actions in plans 2021/6/19 Data Mining: Principles and Algorithms 9

Generalization-Based Sequence Mining n n Generalize planbase in multidimensional way using dimension tables Use

Generalization-Based Sequence Mining n n Generalize planbase in multidimensional way using dimension tables Use # of distinct values (cardinality) at each level to determine the right level of generalization (level“planning”) Use operators merge “+”, option “[]” to further generalize patterns Retain patterns with significant support 2021/6/19 Data Mining: Principles and Algorithms 10

Generalized Sequence Patterns n Airport. Size-sequence survives the min threshold (after applying merge operator):

Generalized Sequence Patterns n Airport. Size-sequence survives the min threshold (after applying merge operator): S-L+-S [35%], L+-S [30%], S-L+ [24. 5%], L+ [9%] n After applying option operator: [S]-L+-[S] [98. 5%] n n Most of the time, people fly via large airports to get to final destination Other plans: 1. 5% of chances, there are other patterns: S-S, L-S-L 2021/6/19 Data Mining: Principles and Algorithms 11