Summarizing Parameter Space for Interactive Exploration of Association

  • Slides: 1
Download presentation
Summarizing Parameter Space for Interactive Exploration of Association Rules Abhishek Mukherji, Xika Lin, Professor

Summarizing Parameter Space for Interactive Exploration of Association Rules Abhishek Mukherji, Xika Lin, Professor Elke A. Rundensteiner, Professor Matthew O. Ward XMDVTool, Department of Computer Science This project is supported by NSF under grants IIS-080812027 and CCF-0811510. INTERACTIVE AND EXPLORATORY DATA MINING SUMMARIZING PARAMETER SPACE GOAL: Retrieve the right number of interesting rules. min. Supp min. Conf § No prior knowledge of how new rules will be generated with change in parameter values. Repeated rules: Once valid, rule X=>Y will remain valid for the entire subspace. Data Miner {ARs} § Analysts proceed by trial-and-error. § Long response time. § No reuse of results. Reduced lattice* Confidence Full lattice representing a dataset 5/6 AC->BD CD->AB C->B A->BD D->B AC->BD CD->AB C->B A->BD C->B B->D D->B 1. Determine all 4/5 cut-points in the AC->BD CD->AB 1. Determine all cut-points C->B B->D parameter space. D->B in the parameter space. 3/4 (12345) D A=>BCD S = 3, C = 3/4 4/6 3/6 Confidence D->AB C->B A->BD B->D A->BCD C->ABD B->D D->B A->BCD C->ABD 2. Populate each block with B->C B->AD 2. Populate each association rules. 3/5 D->AB block with rules. D->ABC A->BCD itemset RULE GRAPH SEARCH OVERALL TECHNIQUE 1 B->C B->AD D->AB C->B A->BD C->ABD B->C B->AD B->ACD D->ABC 0 A->BCD C->ABD C->B A->BD D->B 0 §Cumbersome to store rules for potentially infinite number of threshold pairs. §Redundant and repeated rules may clutter users understanding. Support PARAMETER SPACE CONSTRUCTION Stable region: NO new ARs are produced despite change in parameters. Support Limitations Can we store-n-reuse? support list Redundant rules: AB=>C | A=>BC Confidence § Patterns are non-uniformly distributed over the data set. Large continuous parameter space 3 4 Support 5 § Redundancy eliminating search over a directed acyclic graph. 1 B->D 5/6 4/5 1. Eliminate repeated rules § Each rule is only stored once. D->AB 3/4 4/6 2. Create 2 -level search tree. B->C B->AD 3/5 3/6 0 0 3 INDEX STRUCTURE 4 Support 5 *Mohammed J. Zaki. Mining non-redundant association rules. Data Mining Knowledge Discovery, 9(3): 223 -248, 2004 CONTRIBUTIONS § Explored the parameter space for ARs. § Defined stable regions in the parameter space. § Developed efficient index and search mechanisms. § Achieved store-n-reuse for quick response to interactive user queries.