Budowa regu decyzyjnych z rozmyt granulacj wiedzy Zenon

Budowa reguł decyzyjnych z rozmytą granulacją wiedzy Zenon A. Sosnowski Wydział Informatyki Politechnika Białostocka Wiejska 45 A, 15 -351 Bialystok zenon@wi. pb. edu. pl

Agenda • • • wprowadzenie drzewa decyzyjne (DT) zbiory rozmyte w granulacji atrybutów algorytm generowania kontekstowych DT przykład wnioski

Rozmyta sieć RETE The inference mechanism realizes a generalized modus ponens rule. if A then C CFr A' CFf -----------C' CFc CFr is an uncertainty of the rule CFf is an uncertainty of the fact CFc is an uncertainty of the conclusion CFc = CFr * CFf

Fuzzy_Fuzzy

(defrule r 1 (speed very fast) => (. . . )) (defrule r 2 (speed slow) => (. . . )) (speed medium) - WME SINGLE (LV speed) MULTIFIELD End of pattern M. (very fast) (attached) M. (slow) (attached) activation rule r 2

Decicion Trees – An Overview • used to solve classification problems • structure of problem - attributes - each attribute assumes a finite number values - finite number of discrete classes • entropy-based optimization criterion • architecture of decision tree: nodes – attributes, edges – values of attributes

Coping with Continuous Attributes Decision trees require finite-valued attributes What if attributes are continuous ? Attributes need to be discretrized Options: - discretize each attribute separately (uniform and nonuniform) - discretize all attributes (clustering)

Quantization of attributes through clustering • Fuzzy Clustering • Context-based fuzzy clustering

Fuzzy Clustering (FCM) versus Context-Based FCM (c. FCM) Fuzzy clustering: objective function and its iteraive optimization Context-base fuzzy clustering: - objective function minimized iteratively - continuous classification variable granulated with the use of linguistic labels

Context-Based Fuzzy Clustering Given: data {xk, yk}, k=1, 2, …, N, number of clusters (c), distance function ||. ||, fuzzy set of context A defined over yk Constrained-based optimization of objective function subject to

From context fuzzy set A to the labeling of data to be clustered

Context-Based Fuzzy Clustering: An Iterative Optimization Process Given: The number of clusters (c). Select the distance function ||. ||, termination criterion e (>0) and initialize partition matrix U U. Select the value of the fuzzification parameter “m” (the default is m=2. 0) 1. Calculate centers (prototypes) of the clusters i=1, 2, . . . , c 2. Update partition matrix i=1, 2, . . . , c, j=1, 2, . . . , N 3. Compare U' to U, if termination criterion ||U’ - U|| <e is satisfied then stop, else return to step (1) and proceed with computing by setting up U equal to U' Result: partition matrix and prototypes

Information Granules in the Development of Decision Trees • define contexts (fuzzy sets) for continuous classivication variable • cluster data for each context • project prototypes on the individual axes – this leads to their discretization • carry out the standard ID-3 algorithm W. Pedrycz, Z. A. Sosnowski, „The designing of decision trees in the framework of granular data and their application to software quality models”, Fuzzy Sets & Sysytems, vol. 124, (2001), p. 271 -290

Fuzzy Sets of Contexts: Two Approaches • subjective selection depending on the classification problem • supported by statistical relevance (σ-count of fuzzy contexts)

Constructing linguistic terms – classes (thin line) and their induced intervalvalued counterparts (solid line)

C - Fuzzy Decision Trees W. Pedrycz, Z. A. Sosnowski, „C-Fuzzy Decision Trees”, IEEE Transactions on Systems, Man and Cybernetics, Part C, Vol. 35, No 4, 2005, p. 498 -511.

Architecture of the cluster-based decision tree • • cluster all data set X repeat • allocate elements of X to each cluster • choose the node with the highest value of the spliting criterion • cluster data at selected node until termination criterion is fulfield

Node splitting criterion Node of the tree Ni = <Xi, Yi, Ui> where: Xi = { x(k) | ui(x(k)) > uj(x(k))} Yi = {y(k)| x(k) ε Xi} Ui = [ui(x(1)) ui(x(2)) … ui(x(N))]

Stopping criterion (structurability index)

C-fuzzy tree in the classification (prediction) mode assign x to class wi if ui(x) exceeds the values of the membership in all remaining clusters

Experiments Data sets from the UCI repository of Machine Learning Databases (http: //www. ics. uci. edu) • • • Auto-Mpg Pima-diabetes Ionosphere Hepatitis Dermatology

Hepatitis data Type of tree and its structural parameters Error: Training data Error: Testing data Number of nodes C 4. 5 rev. 8 6. 46 % (average) 0. 85 % (st. deviation) 43. 86 % (average) 7. 05 % (st. deviation) 45 (average) 7. 87 (st. deviation) C-decision tree 17. 58 % (average) c=2 clusters, 6 3. 34 % (st. iterations deviation) 36. 13 % (average) 0. 08 % (st. deviation) 12 C-decision tree 24. 84 % (average) c=9 clusters, 3 5. 21 % (st. iterations deviation) 34. 19 % (average) 3. 68 % (st. deviation) 27

Dermatology data Type of tree and its structural parameters Error: Training data Error: Testing data Number of nodes C 4. 5 rev. 8 1. 52 % (average) 0. 61 % (st. deviation) 5, 98% (average) 3. 50% (st. deviation) 18. 6 (average) 4. 34 (st. deviation) C-decision tree 7. 0 % (average) C=11 clusters, 1, 68 % (st. 1 iterations deviation) 4. 9 % (average) 3. 56 % (st. deviation) 11 C-decision tree 6. 1 % (average) c=7 clusters, 1 1. 15 % (st. iterations deviation) 5. 7 % (average) 2. 47 % (st. deviation) 7

Context-based Fuzzy Clusteredoriented Decision Trees (CFCDT). . .

Architecture of the Context-based Fuzzy Clustered-oriented Decision Tree define contexts (fuzzy sets) for classivication variable for each context do – cluster (c. FCM) Xi (data set of i-th context) – repeat – allocate elements of Xi to each cluster – choose the node with the highest value of the spliting criterion – cluster (c. FCM) data at selected node until termination criterion is fulfield enddo

Problem Implementation issues: • high complexity –> grid or cluster computing • agregation -> testing of different appraches

Dziękuję za uwagę