SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ
SLIQ (SUPERVISED LEARNING IN QUEST) • Decision-tree classifier for data mining • Design goals: • Able to handle large disk-resident training sets • No restrictions on training-set size 2/11
BUILDING TREE Make. Tree(Training Data T) Partition(Data S) if(all points in S are in the same class) END_Make. Tree return; Evaluate Splits for each attribute A; Use best split to partition S into S 1 and S 2; Partition(S 1); Partition(S 2); END_Partition 3/11
EVALUATING SPLIT POINTS • 4/11
PRE-SORTING • Before we start to build a tree we need to sort data 5/11
FINDING SPLIT POINTS • For each attribute A do • evaluate splits on attribute A using attribute list • Keep split with lowest GINI index 6/11
FINDING SPLIT POINTS Initialize class-histograms of left and right children; for each record in the attribute list do find the corresponding entry in Class List and the class and Leaf node evaluate splitting index for value(A) < record. value; update the class histogram in the leaf 7/11
FINDING SPLIT POINTS 8/11
IMPLEMENTATION • C++ • Pre-Sorting is done on GPU (CUDA) 9/11
10/11
RESULTS Time 7000 6000 5000 4000 3000 2000 1000 0 1 M 5 M 10 M 11/11