ECLAT Jelena Stani 32122015 ECLAT Equivalence Class Transformation

  • Slides: 35
Download presentation
ECLAT Jelena Stančić 3212/2015

ECLAT Jelena Stančić 3212/2015

ECLAT Equivalence Class Transformation A Frequent Itemset Mining (FIM) algorithm scans the database, and

ECLAT Equivalence Class Transformation A Frequent Itemset Mining (FIM) algorithm scans the database, and finds item-sets that occur in transactions more frequently than a given threshold. Scientific and industrial applications, including those in machine learning, computational biology, intrusion detection, web log mining, and e-business benefit from the use of frequent itemset mining. 2/35

ECLAT Transactions, originally stored in horizontal format, are read from disk and converted to

ECLAT Transactions, originally stored in horizontal format, are read from disk and converted to vertical format. TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread , Milk, Diaper, Beer 5 Bread , Milk, Diaper, Coke Bread Milk Diaper 1 1 2 2 3 3 4 4 4 5 5 5 Beer Coke Eggs 2 3 5 4 3/35

ECLAT The frequency of each item is counted and the infrequent items and their

ECLAT The frequency of each item is counted and the infrequent items and their corresponding vertical lists are deleted from the vertical list. Bread Milk Diaper 1 1 2 2 3 4 Milk Diaper 1 1 2 3 3 4 4 4 5 5 5 Beer Coke Eggs Beer 2 3 2 2 3 5 4 Bread 3 4 4/35 Minimum support = 3

ECLAT The Eclat algorithm is defined recursively. The initial call uses all the single

ECLAT The Eclat algorithm is defined recursively. The initial call uses all the single items with their tidsets. In each recursive call, the function verifies each itemset-tidset pair with all the others pairs to generate new candidates. If the new candidate is frequent, it is added to the set. Then, recursively, it finds all the frequent itemsets in the branch. 5/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 6/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 7/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 8/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 9/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 10/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 {Bread, Milk, Diaper} 4 5 11/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} {Bread, Beer} 1 2 2 4 4 4 5 5 {Bread, Milk, Diaper} 4 5 12/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 13/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 14/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Milk, Diaper} {Milk, Beer} 3 3 4 4 5 15/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Milk, Diaper} {Milk, Beer} 3 3 4 4 5 16/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 17/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 18/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Diaper, Beer} 2 3 4 19/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT 20/35

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5

Milk Diaper Beer 1 1 2 2 2 3 3 3 4 4 5 5 5 Bread ECLAT {Bread, Milk} {Bread, Diaper} 1 2 4 4 5 5 {Milk, Diaper} 3 4 5 {Diaper, Beer} 2 3 4 21/35

ECLAT Uses vertical database – tidset(bitset) intersections. Scans the database only once. Depth-first search

ECLAT Uses vertical database – tidset(bitset) intersections. Scans the database only once. Depth-first search algorithm. 22/35

ECLAT 23/35

ECLAT 23/35

ECLAT - PARALLEL 24/35

ECLAT - PARALLEL 24/35

ECLAT - PARALLEL “Equivalent Class”, which can be defined by a set of candidates

ECLAT - PARALLEL “Equivalent Class”, which can be defined by a set of candidates with the same size, assumed k, shared the common k− 1 prefix. {Bread, Milk} {Bread, Diaper} {Milk, Beer} {Bread, Beer} {Diaper, Beer} 25/35

ECLAT - PARALLEL 26/35

ECLAT - PARALLEL 26/35

ECLAT – CPU IMPLEMENTATIONI public class Algo. Eclat{. . public Itemsets run. Algorithm{ final

ECLAT – CPU IMPLEMENTATIONI public class Algo. Eclat{. . public Itemsets run. Algorithm{ final Map<Integer, Set<Integer>> map. Item. Count = new Hash. Map<Integer, Set<Integer>>(); int max. Item. Id = calculate. Support. Single. Items(database, map. Item. Count); // (2) create the list of single items List<Integer> frequent. Items = new Array. List<Integer>(); // for each item for(Entry<Integer, Set<Integer>> entry : map. Item. Count. entry. Set()) { Set<Integer> tidset = entry. get. Value(); int support = tidset. size(); int item = entry. get. Key(); if(support >= minsup. Relative) { frequent. Items. add(item); save. Single. Item(item, tidset. size()); } } Collections. sort(frequent. Items, new Comparator<Integer>() { public int compare(Integer arg 0, Integer arg 1) { return map. Item. Count. get(arg 0). size() - map. Item. Count. get(arg 1). size(); }}); // For each frequent item I according to the total order for(int i=0; i < frequent. Items. size(); i++) { Integer item. I = frequent. Items. get(i); // we obtain the tidset and support of that item Set<Integer> tidset. I = map. Item. Count. get(item. I); int support. I = tidset. I. size(); 27/35

ECLAT – CPU IMPLEMENTATIONII // We create empty equivalence class for storing all 2

ECLAT – CPU IMPLEMENTATIONII // We create empty equivalence class for storing all 2 -itemsets starting with List<Integer> equivalence. Class. Iitems = new Array. List<Integer>(); List<Set<Integer>> equivalence. Class. Itidsets = new Array. List<Set<Integer>>(); for(int j=i+1; j < frequent. Items. size(); j++) { int item. J = frequent. Items. get(j); Set<Integer> tidset. J = map. Item. Count. get(item. J); int support. J = tidset. J. size(); Set<Integer> tidset. IJ = perform. ANDFirst. Time(tidset. I, support. I, tidset. J, support. J); if(calculate. Support(2, support. I, tidset. IJ) >= minsup. Relative){ equivalence. Class. Iitems. add(item. J); // We also keep the tidset of "ij". equivalence. Class. Itidsets. add(tidset. IJ); } } if(equivalence. Class. Iitems. size() > 0) { // This is done by a recursive call. Note that we pass // item I to that method as the prefix of that equivalence class. itemset. Buffer[0] = item. I; process. Equivalence. Class(itemset. Buffer, 1, support. I, equivalence. Class. Iitems, equivalence. Class. Itidsets); } } }. . . } 28/35

ECLAT – CPU IMPLEMENTATIONIII private void process. Equivalence. Class(int[] prefix, int prefix. Length, int

ECLAT – CPU IMPLEMENTATIONIII private void process. Equivalence. Class(int[] prefix, int prefix. Length, int support. Prefix, List<Integer> equivalence. Class. Items, List<Set<Integer>> equivalence. Class. Tidsets) throws IOException { int length = prefix. Length+1; if(equivalence. Class. Items. size() == 1) { int item. I = equivalence. Class. Items. get(0); Set<Integer> tidset. Itemset = equivalence. Class. Tidsets. get(0); save(prefix, prefix. Length, item. I, tidset. Itemset, calculate. Support(length, support. Prefix, tidset. Itemset)); return; } if(equivalence. Class. Items. size() == 2) { int item. I = equivalence. Class. Items. get(0); Set<Integer> tidset. I = equivalence. Class. Tidsets. get(0); int support. I = calculate. Support(length, support. Prefix, tidset. I); save(prefix, prefix. Length, item. I, tidset. I, support. I); int item. J = equivalence. Class. Items. get(1); Set<Integer> tidset. J = equivalence. Class. Tidsets. get(1); int support. J = calculate. Support(length, support. Prefix, tidset. J); save(prefix, prefix. Length, item. J, tidset. J, support. J); Set<Integer> tidset. IJ = this. perform. AND(tidset. I, tidset. I. size(), tidset. J. size()); int support. IJ = calculate. Support(length, support. I, tidset. IJ); if(support. IJ >= minsup. Relative) { int new. Prefix. Length = prefix. Length+1; prefix[prefix. Length] = item. I; save(prefix, new. Prefix. Length, item. J, tidset. IJ, support. IJ); } return; } 29/35

ECLAT – CPU IMPLEMENTATIONIV for(int i=0; i< equivalence. Class. Items. size(); i++) { int

ECLAT – CPU IMPLEMENTATIONIV for(int i=0; i< equivalence. Class. Items. size(); i++) { int suffix. I = equivalence. Class. Items. get(i); Set<Integer> tidset. I = equivalence. Class. Tidsets. get(i); int support. I = calculate. Support(length, support. Prefix, tidset. I); save(prefix, prefix. Length, suffix. I, tidset. I, support. I); List<Integer> equivalence. Class. ISuffix. Items= new Array. List<Integer>(); List<Set<Integer>> equivalence. ITidsets = new Array. List<Set<Integer>>(); for(int j=i+1; j < equivalence. Class. Items. size(); j++) { int suffix. J = equivalence. Class. Items. get(j); Set<Integer> tidset. J = equivalence. Class. Tidsets. get(j); int support. J = calculate. Support(length, support. Prefix, tidset. J); Set<Integer> tidset. IJ = perform. AND(tidset. I, support. I, tidset. J, support. J); int support. IJ = calculate. Support(length, support. I, tidset. IJ); if(support. IJ >= minsup. Relative) { equivalence. Class. ISuffix. Items. add(suffix. J); equivalence. ITidsets. add(tidset. IJ); } } if(equivalence. Class. ISuffix. Items. size() >0) { prefix[prefix. Length] = suffix. I; int new. Prefix. Length = prefix. Length+1; process. Equivalence. Class(prefix, new. Prefix. Length, support. I, equivalence. Class. ISuffix. Items, equivalence. ITidsets); } } 30/35

ECLAT – GPU IMPLEMENTATIONI 31/35

ECLAT – GPU IMPLEMENTATIONI 31/35

ECLAT – GPU IMPLEMENTATIONII __global__ void kernel_calc( unsigned int** src_list_1, unsigned int** src_list_2, int**

ECLAT – GPU IMPLEMENTATIONII __global__ void kernel_calc( unsigned int** src_list_1, unsigned int** src_list_2, int** dst_list, int* result, int list_len, unsigned int vlist_len) { __shared__ unsigned int sup[MAX_THREAD]; pdst[thread_pos]=tmp; unsigned int* psrc 1; unsigned int* psrc 2; unsigned int* pdst; unsigned int iter, i, tmp; unsigned int bound; if (block. Idx. x >= list_len) return; } __syncthreads(); for (bound = block. Dim. x / 2; bound > 0; bound >>= 1) { if (thread. Idx. x < bound) sup[thread. Idx. x]+=sup[thread. Idx. x+bound]; __syncthreads(); } sup[thread. Idx. x] = 0; __syncthreads(); iter = (vlist_len - 1) / block. Dim. x + 1; psrc 1 = src_list_1[block. Idx. x]; psrc 2 = src_list_2[block. Idx. x]; pdst = dst_list[block. Idx. x]; if(thread. Idx. x == 0) { *(result+current_block_pos)=sup[0]; } } __syncthreads(); for (i = 0; i < iter; i++) { int thread_pos = i * block. Dim. x + thread. Idx. x; if (thread_pos >= vlist_len) break; tmp=psrc 1[thread_pos] & psrc 2[thread_pos]; sup[thread. Idx. x]+=__popc(tmp); 32/35

ECLAT – CPU VS GPU System characteristics: Processor: Installed memory(RAM): System type: GPU: OS:

ECLAT – CPU VS GPU System characteristics: Processor: Installed memory(RAM): System type: GPU: OS: Intel(R) Core(TM) i 5 -5200 CPU @ 2. 20 GHZ 8. 00 GB 64 -bit Operating system, x 64 -based processor NVIDIA Ge. Force 940 M Windows 10 Chess Retail 70 14 60 12 50 10 40 CPU 30 GPU 8 20 4 10 2 0 CPU 6 GPU 0 50% 60% 70% 80% 90% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 33/35

ECLAT – CPU VS GPU Pumsb Accidents 160 7000 140 6000 120 5000 100

ECLAT – CPU VS GPU Pumsb Accidents 160 7000 140 6000 120 5000 100 4000 CPU 3000 GPU 2000 80 CPU 60 GPU 40 20 1000 0 0 60% 70% 80% Mushroom 90% 50% 100% 1600 80 60% 70% 80% 90% 100% Connect 1400 70 1200 60 1000 50 40 30 20 CPU 800 GPU 600 CPU GPU 400 10 200 90 % 10 0% 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 5% 0 34/35 0 70% 80% 90% 100%

THANK YOU FOR YOUR ATTENTION! Questions? 35/35

THANK YOU FOR YOUR ATTENTION! Questions? 35/35