Chapter 4 Data Mining Primitives Languages and System

  • Slides: 28
Download presentation
Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What defines a data mining task? n A data mining query language n Design graphical user interfaces based on a data mining query language n Architecture of data mining systems n Summary 1/20/2022 Data Mining: Concepts and Techniques 1

Why Data Mining Primitives and Languages? n n 1/20/2022 It is unrealistic trying to

Why Data Mining Primitives and Languages? n n 1/20/2022 It is unrealistic trying to find all the patterns autonomously in a database because the patterns could be too many but uninteresting Data mining should be an interactive process n user directs what kind of mining to be performed Users must be provided with a set of primitives to be used to communicate with the data mining system By incorporating these primitives in a data mining query language n User’s interaction with the system becomes more flexible n A foundation for the design of graphical user interface n Standardization of data mining industry and practice Data Mining: Concepts and Techniques 2

What Defines a Data Mining Task ? n Task-relevant data n Type of knowledge

What Defines a Data Mining Task ? n Task-relevant data n Type of knowledge to be mined n Background knowledge n Pattern interestingness measurements n Visualization of discovered patterns 1/20/2022 Data Mining: Concepts and Techniques 3

Task-Relevant Data n Database or data warehouse name n Database tables or data warehouse

Task-Relevant Data n Database or data warehouse name n Database tables or data warehouse cubes n Condition for data selection n Relevant attributes or dimensions n Data grouping criteria 1/20/2022 Data Mining: Concepts and Techniques 4

Types of knowledge to be mined n Characterization n Discrimination n Association n Classification/prediction

Types of knowledge to be mined n Characterization n Discrimination n Association n Classification/prediction n Clustering n Outlier analysis n Other data mining tasks 1/20/2022 Data Mining: Concepts and Techniques 5

Background knowledge n n Concept hierarchies n schema hierarchy n eg. street < city

Background knowledge n n Concept hierarchies n schema hierarchy n eg. street < city < province_or_state < country n set-grouping hierarchy n eg. {20 -39} = young, {40 -59} = middle_aged n operation-derived hierarchy n email address, login-name < department < university < country n rule-based hierarchy n low_profit (X) <= price(X, P 1) and cost (X, P 2) and (P 1 - P 2) < $50 User’s existing knowledge of the data. n E. g. structural zero 1/20/2022 Data Mining: Concepts and Techniques 6

Pattern interestingness measurements n n 1/20/2022 Simplicity e. g. rule length Certainty e. g.

Pattern interestingness measurements n n 1/20/2022 Simplicity e. g. rule length Certainty e. g. confidence, P(A|B) = n(A and B)/ n (B) Utility potential usefulness, eg. support Novelty not previously known, surprising Data Mining: Concepts and Techniques 7

Visualization of Discovered Patterns n Different background/purpose may require different form of representation n

Visualization of Discovered Patterns n Different background/purpose may require different form of representation n n E. g. , rules, tables, crosstabs, pie/bar chart etc. Concept hierarchies is also important n discovered knowledge might be more understandable when represented at high concept level. n Interactive drill up/down, pivoting, slicing and dicing provide different perspective to data. n Different knowledge required different representation. 1/20/2022 Data Mining: Concepts and Techniques 8

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What defines a data mining task? n A data mining query language n Design graphical user interfaces based on a data mining query language n Architecture of data mining systems n Summary 1/20/2022 Data Mining: Concepts and Techniques 9

A Data Mining Query Language (DMQL) n Motivation n By providing a standardized language

A Data Mining Query Language (DMQL) n Motivation n By providing a standardized language like SQL, we hope to achieve the same effect that SQL have on relational database. Design n 1/20/2022 A DMQL can provide the ability to support ad-hoc and interactive data mining DMQL is designed with the primitives we describe earlier in mind. Data Mining: Concepts and Techniques 10

Syntax for DMQL v Syntax for specification of v task-relevant v the data kind

Syntax for DMQL v Syntax for specification of v task-relevant v the data kind of knowledge to be mined v concept hierarchy specification v interestingness v pattern v measure presentation and visualization Putting it all together — a DMQL query 1/20/2022 Data Mining: Concepts and Techniques 11

Syntax for task-relevant data specification n use database_name, or use data warehouse data_warehouse_name n

Syntax for task-relevant data specification n use database_name, or use data warehouse data_warehouse_name n from relation(s)/cube(s) [where condition] n in relevance to att_or_dim_list n order by order_list n group by grouping_list n having condition 1/20/2022 Data Mining: Concepts and Techniques 12

Specification of task-relevant data 1/20/2022 Data Mining: Concepts and Techniques 13

Specification of task-relevant data 1/20/2022 Data Mining: Concepts and Techniques 13

Syntax for specifying the kind of knowledge to be mined n n n Characterization

Syntax for specifying the kind of knowledge to be mined n n n Characterization Mine_Knowledge_Specification : : = mine characteristics [as pattern_name] analyze measure(s) Discrimination Mine_Knowledge_Specification : : = mine comparison [as pattern_name] for target_class where target_condition {versus contrast_class_i where contrast_condition_i} analyze measure(s) Association Mine_Knowledge_Specification : : = mine associations [as pattern_name] 1/20/2022 Data Mining: Concepts and Techniques 14

Syntax for specifying the kind of knowledge to be mined (cont. ) Classification Mine_Knowledge_Specification

Syntax for specifying the kind of knowledge to be mined (cont. ) Classification Mine_Knowledge_Specification : : = mine classification [as pattern_name] analyze classifying_attribute_or_dimension v Prediction Mine_Knowledge_Specification : : = mine prediction [as pattern_name] analyze prediction_attribute_or_dimension {set {attribute_or_dimension_i= value_i}} v 1/20/2022 Data Mining: Concepts and Techniques 15

Syntax for concept hierarchy specification n n 1/20/2022 To specify what concept hierarchies to

Syntax for concept hierarchy specification n n 1/20/2022 To specify what concept hierarchies to use hierarchy <hierarchy> for <attribute_or_dimension> We use different syntax to define different type of hierarchies n schema hierarchies define hierarchy time_hierarchy on date as [date, month quarter, year] n set-grouping hierarchies define hierarchy age_hierarchy for age on customer as level 1: {young, middle_aged, senior} < level 0: all level 2: {20, . . . , 39} < level 1: young level 2: {40, . . . , 59} < level 1: middle_aged level 2: {60, . . . , 89} < level 1: senior Data Mining: Concepts and Techniques 16

Syntax for concept hierarchy specification (Cont. ) n n 1/20/2022 operation-derived hierarchies define hierarchy

Syntax for concept hierarchy specification (Cont. ) n n 1/20/2022 operation-derived hierarchies define hierarchy age_hierarchy for age on customer as {age_category(1), . . . , age_category(5)} : = cluster(default, age, 5) < all(age) rule-based hierarchies define hierarchy profit_margin_hierarchy on item as level_1: low_profit_margin < level_0: all if (price - cost)< $50 level_1: medium-profit_margin < level_0: all if ((price - cost) > $50) and ((price - cost) <= $250)) level_1: high_profit_margin < level_0: all if (price - cost) > $250 Data Mining: Concepts and Techniques 17

Syntax for interestingness measure specification n Interestingness measures and thresholds can be specified by

Syntax for interestingness measure specification n Interestingness measures and thresholds can be specified by the user with the statement: with <interest_measure_name> threshold = threshold_value n Example: with support threshold = 0. 05 with confidence threshold = 0. 7 1/20/2022 Data Mining: Concepts and Techniques 18

Syntax for pattern presentation and visualization specification n n We have syntax which allows

Syntax for pattern presentation and visualization specification n n We have syntax which allows users to specify the display of discovered patterns in one or more forms display as <result_form> To facilitate interactive viewing at different concept level, the following syntax is defined: Multilevel_Manipulation : : = roll up on attribute_or_dimension | drill down on attribute_or_dimension | add attribute_or_dimension | drop attribute_or_dimension 1/20/2022 Data Mining: Concepts and Techniques 19

Putting it all together: the full specification of a DMQL query use database All.

Putting it all together: the full specification of a DMQL query use database All. Electronics_db use hierarchy location_hierarchy for B. address mine characteristics as customer. Purchasing analyze count% in relevance to C. age, I. type, I. place_made from customer C, item I, purchases P, items_sold S, works_at W, branch where I. item_ID = S. item_ID and S. trans_ID = P. trans_ID and P. cust_ID = C. cust_ID and P. method_paid = ``Am. Ex'' and P. empl_ID = W. empl_ID and W. branch_ID = B. branch_ID and B. address = ``Canada" and I. price >= 100 with noise threshold = 0. 05 display as table 1/20/2022 Data Mining: Concepts and Techniques 20

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What defines a data mining task? n A data mining query language n Design graphical user interfaces based on a data mining query language n Architecture of data mining systems n Summary 1/20/2022 Data Mining: Concepts and Techniques 21

Designing graphical user interfaces based on a data mining query language v Data collection

Designing graphical user interfaces based on a data mining query language v Data collection and data mining query composition v Presentation of discovered patterns v Hierarchy specification and manipulation v Manipulation of data mining primitives v Interactive multilevel mining v Other miscellaneous information 1/20/2022 Data Mining: Concepts and Techniques 22

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What defines a data mining task? n A data mining query language n Design graphical user interfaces based on a data mining query language n Architecture of data mining systems n Summary 1/20/2022 Data Mining: Concepts and Techniques 23

Data Mining System Architectures n Coupling data mining system with DB/DW system n No

Data Mining System Architectures n Coupling data mining system with DB/DW system n No coupling—flat file processing, not recommended n Loose coupling n n Semi-tight coupling—enhanced DM performance n n Provide efficient implement a few data mining primitives in a DB/DW system, e. g. , sorting, indexing, aggregation, histogram analysis, multiway join, precomputation of some stat functions Tight coupling—A uniform information processing environment n 1/20/2022 Fetching data from DB/DW DM is smoothly integrated into a DB/DW system, mining query is optimized based on mining query, indexing, query processing methods, etc. Data Mining: Concepts and Techniques 24

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What

Chapter 4: Data Mining Primitives, Languages, and System Architectures n Data mining primitives: What defines a data mining task? n A data mining query language n Design graphical user interfaces based on a data mining query language n Architecture of data mining systems n Summary 1/20/2022 Data Mining: Concepts and Techniques 25

Summary n n n Five primitives for specification of a data mining task n

Summary n n n Five primitives for specification of a data mining task n task-relevant data n kind of knowledge to be mined n background knowledge n interestingness measures n knowledge presentation and visualization techniques to be used for displaying the discovered patterns Data mining query languages n DMQL, MS/OLEDB for DM, etc. Data mining system architecture n No coupling, loose coupling, semi-tight coupling, tight coupling 1/20/2022 Data Mining: Concepts and Techniques 26

References n n n Need to work out more!!! J. Han, Y. Fu, W.

References n n n Need to work out more!!! J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane, “DMQL: A Data Mining Query Language for Relational Databases”, Proc. 1996 SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96)}, pp. 27 -33, Montreal, Canada, June 1996. A R. Meo, A G. Psaila, A S. Ceri T, “A New SQLlike Operator for Mining Association Rules”, Proc. 1996 Int. Conf. Very Large Data Bases, Bombay, India, P 122 -133, Sept. 1996. 1/20/2022 Data Mining: Concepts and Techniques 27

http: //www. cs. sfu. ca/~han Thank you !!! 1/20/2022 Data Mining: Concepts and Techniques

http: //www. cs. sfu. ca/~han Thank you !!! 1/20/2022 Data Mining: Concepts and Techniques 28