On Reducing Classifier Granularity in Mining ConceptDrifting Data
- Slides: 20
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International Conference on Data Mining (ICDM’ 05) Speaker: Yu Jiun Liu Date : 2006/9/26
Introduction o State of the art n n o The incrementally updated classifiers. The ensemble classifiers. Model Granularity n n Traditional : monolithic This paper : semantic decomposition
Motivation o o The model is decomposable into smaller components. The decomposition is semantic-aware in the sense.
Monolithic Models o o o Stream : Attributes : Class Label : Window : Model (Classifier) : Ci
Rule-based Models o A rule form : minsup = 0. 3 and minconf = 0. 8 Valid rules of W 1 are: o Valid rules of W 3 are: o o
Algorithm o Phase 1 : Initialization n n o Use the first w records to train all valid rules for window W 1. Construct the RS-tree and REC-tree. Phase 2 : Update n n When record arrives, insert it into the REC-tree and update the sup. and conf. of the rules matched by it. Delete oldest record and update the value matched by it.
Data Structure
RS-Tree o o o A prefix tree with attribute order Each node N represents a unique rule R : P Ci N’ (P’ Cj) is a child node of N, iff:
REC-Tree o o Each record r as a sequence Node N points to rule in the RS-tree if :
Detecting Concept Drifts o percentage V. S. the distribution of the misclassified records. The percentage approach cannot tell us which part of the classifier gives rise to the inaccuracy.
Definition
Finding Rule Algorithm
Update Algorithm
Experiments o o o CPU : 1. 7 GHz Memory : 256 MB Datasets : synthetic and real life dataset. n Synthetic : o n Real life dataset : o 10, 344 recodes and 8 dimensions.
Effect of model updating o o Synthetic 10 dimensions Window size 5000 4 dimensions changing
The relation of concept drifts and
Effect of rule composition
Accuracy and Time o o o Window size : 10, 000 EC : 10 classifiers, each trained on 1000 records. Synthetic data.
Real life data
Conclusion o o Overcome the effects of concept drifts. By reducing granularity, change detection and model update can be more efficient without compromising classification accuracy.
- Granularity data warehouse
- Eck
- Design and implementation issues of dsm
- Mining multimedia databases in data mining
- Reducing sugars vs non reducing sugars
- Glycogen reducing end
- Carbohydrates
- Reducing vs non reducing sugar
- Strip mining vs open pit mining
- Strip mining vs open pit mining
- Difference between strip mining and open pit mining
- Web text mining
- Data reduction in data mining
- Data mining in data warehouse
- What is missing data in data mining
- Data reduction in data mining
- Data reduction in data mining
- Data reduction in data mining
- Shell cube in data mining
- Data reduction in data mining
- Arsitektur data mining