1 Mining Dependent Patterns Hansheng Lei Univ of

  • Slides: 18
Download presentation
1 Mining Dependent Patterns Hansheng Lei Univ. of Texas Rio Grande Valley Yamin Hu,

1 Mining Dependent Patterns Hansheng Lei Univ. of Texas Rio Grande Valley Yamin Hu, Wenjian Luo Univ. of Science and Tech. of China Cheng Chang Pan Nova Southeastern University

Outline • • • Motivation Association Rule Mining Dependent Patterns Experimental results Conclusion ICDIS

Outline • • • Motivation Association Rule Mining Dependent Patterns Experimental results Conclusion ICDIS – Int. Conf. on Data Intelligence and Security 2

Motivation • Mining Survey Data

Motivation • Mining Survey Data

Association Rule Mining • Proposed by Agrawal et al in 1993. • Applied in

Association Rule Mining • Proposed by Agrawal et al in 1993. • Applied in market basket analysis to find how items purchased by customers are related. Beer Diaper [sup = 5%, conf = 100%] 4

AR Model • I = {i 1, i 2, …, im}: a set of

AR Model • I = {i 1, i 2, …, im}: a set of items. • Transaction t : • t a set of items, and t I. • Transaction Database T: a set of transactions T = {t 1, t 2, …, tn}.

Association rules • An association rule is an implication of the form: X Y,

Association rules • An association rule is an implication of the form: X Y, where X, Y I, and X Y = • An itemset is a set of items. • E. g. , X = {milk, bread, cereal} is an itemset. 6

Support and Confidence • The support count of an itemset X in a data

Support and Confidence • The support count of an itemset X in a data set T is the number of transactions in T that contain X. Assume T has n transactions. • Then, 7

Problems with AR mining (a) generates a huge amount of rules (b) Not supporting

Problems with AR mining (a) generates a huge amount of rules (b) Not supporting other relations, such as negative implication, correlation and dependence (c) Universal support and confidence 8

Dependent Patterns 9

Dependent Patterns 9

DP Properties • Downward closure • Individual support thresholds for each item • Right

DP Properties • Downward closure • Individual support thresholds for each item • Right dependence measure 10

Related Work • m-Pattern (not a good measure for dependence) 11

Related Work • m-Pattern (not a good measure for dependence) 11

Experiments Three Algorithms to compare: DP, Apriori, and m-Pattern 12

Experiments Three Algorithms to compare: DP, Apriori, and m-Pattern 12

Pattern Distribution 13

Pattern Distribution 13

Support vs. mining level 14

Support vs. mining level 14

Overlapping 15

Overlapping 15

Scalability 16

Scalability 16

Conclusion • Proof of Concept • Obvious advantages • More scalability testing 17

Conclusion • Proof of Concept • Obvious advantages • More scalability testing 17

18

18