SLIQ SUPERVISED LEARNING IN QUEST STUDENT NIKOLA TERZI

  • Slides: 11
Download presentation
SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ

SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ

SLIQ (SUPERVISED LEARNING IN QUEST) • Decision-tree classifier for data mining • Design goals:

SLIQ (SUPERVISED LEARNING IN QUEST) • Decision-tree classifier for data mining • Design goals: • Able to handle large disk-resident training sets • No restrictions on training-set size 2/11

BUILDING TREE Make. Tree(Training Data T) Partition(Data S) if(all points in S are in

BUILDING TREE Make. Tree(Training Data T) Partition(Data S) if(all points in S are in the same class) END_Make. Tree return; Evaluate Splits for each attribute A; Use best split to partition S into S 1 and S 2; Partition(S 1); Partition(S 2); END_Partition 3/11

EVALUATING SPLIT POINTS • 4/11

EVALUATING SPLIT POINTS • 4/11

PRE-SORTING • Before we start to build a tree we need to sort data

PRE-SORTING • Before we start to build a tree we need to sort data 5/11

FINDING SPLIT POINTS • For each attribute A do • evaluate splits on attribute

FINDING SPLIT POINTS • For each attribute A do • evaluate splits on attribute A using attribute list • Keep split with lowest GINI index 6/11

FINDING SPLIT POINTS Initialize class-histograms of left and right children; for each record in

FINDING SPLIT POINTS Initialize class-histograms of left and right children; for each record in the attribute list do find the corresponding entry in Class List and the class and Leaf node evaluate splitting index for value(A) < record. value; update the class histogram in the leaf 7/11

FINDING SPLIT POINTS 8/11

FINDING SPLIT POINTS 8/11

IMPLEMENTATION • C++ • Pre-Sorting is done on GPU (CUDA) 9/11

IMPLEMENTATION • C++ • Pre-Sorting is done on GPU (CUDA) 9/11

10/11

10/11

RESULTS Time 7000 6000 5000 4000 3000 2000 1000 0 1 M 5 M

RESULTS Time 7000 6000 5000 4000 3000 2000 1000 0 1 M 5 M 10 M 11/11