Mining Negative Rules in Large Databases using GRD
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb
Overview n n n Aims Association Rule Discovery Generalized Rule Discovery Tidsets and Diffsets Conclusion
Aims 1. To mine negative rules in a large database using GRD 2. To assess whether the negative rules are of potential interest to a user
Association Rule Discovery n Rule: A => B (e. g. tea => coffee) n n A is the antecedent B is the consequent Aim: Searches database to find strong associations between itemsets Itemsets are subsets of the dataset e. g. tea in a supermarket
Association Rule Discovery (Contd. ) n Support of Tea => Coffee: Transactions with Tea or Coffee / |Data space| n Confidence of Tea => Coffee : Transactions with Tea or Coffee /Transactions with Tea
Association Rule Discovery (Contd. ) n n Generates rules based on minimum support (frequent itemsets) Further constraints can be applied, e. g. confidence (interest)
Generalized Rule Discovery n n An alternative Association Rule Discovery Uses The OPUS Algorithm for an unordered Search [Webb, 95] n n Generates large number of rules based on user specified constraints. Constraints include minimum support, confidence, etc.
Every itemset is stored with it’s corresponding transaction set (Tidsets) Tidsets and Diffsets [Zaki, Gouda, 01] n Using Vertical Mining has proved to be more efficient than Horizontal Mining. n Tea Coffee Milk 1 1 2 3 3
Tidsets and Diffsets (contd. ) n n Diffsets are a set of transactions that the itemset does not appear in. Diffsets are Tidsets for a negative association of an itemset. Tea 1 3 Coffee Milk 1 2 3 Diffset (Tea) Diffset (Coffee) 2 3 Diffset (Milk)
Tidsets and Diffsets (contd. ) n n GRD calculates Tidsets for an Itemset Therefore Diffsets for an Itemset can be computed with very little extra cost. A 1 C 3 B 1 2 3 4 5 4 5 2 ~A ~B ~C 1 2 3
Conclusion n Find negative correlations between Itemsets in a database. Rule: tea => ~coffee, ~tea => ~ coffee This will be achieved by extending the GRD technique.
Conclusion (Contd. ) n n n Using diffsets: tidsets A = diffset ~A Negative associations can be calculated with very little additional computational overheads Assess whether the results of negative correlations are potentially interesting or not
Any questions?
- Slides: 14