Extending Propositional Satisfiability to Determine Minimal FuzzyRough Reducts
Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts Richard Jensen Aberystwyth University, UK Andrew Tuson City University, UK Qiang Shen Aberystwyth University, UK Richard Jensen, Andrew Tuson and Qiang Shen
Outline • The importance of feature selection • Rough set theory • Fuzzy-rough feature selection (FRFS) • FRFS-SAT • Experimentation • Conclusion Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection • Why dimensionality reduction/feature selection? High dimensional data Dimensionality Reduction Intractable Low dimensional data Processing System • Growth of information - need to manage this effectively • Curse of dimensionality - a problem for machine learning Richard Jensen, Andrew Tuson and Qiang Shen
Rough set theory Upper Approximation Set A Lower Approximation Equivalence class Rx Rx is the set of all points that are indiscernible with point x in terms of feature subset B Richard Jensen, Andrew Tuson and Qiang Shen
Discernibility approach • Decision-relative discernibility matrix • Compare objects • Examine attribute values • For attributes that differ: • If decision values differ, include attributes in matrix • Else leave slot blank • Construct discernibility function: Richard Jensen, Andrew Tuson and Qiang Shen
Example • Remove duplicates f. C(a, b, c, d) = {a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀ {b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀ {a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d} • Remove supersets f. C(a, b, c, d) = {b ⋁ c} ⋀ {d} Richard Jensen, Andrew Tuson and Qiang Shen
Finding reducts • Usually too expensive to search exhaustively for reducts with minimal cardinality • Reducts found through: • Converting from CNF to DNF (expensive) • Hill-climbing search using clauses (non-optimal) • Other search methods - GAs etc (non-optimal) • RSAR-SAT • Solve directly in SAT formulation. • DPLL approach is both fast and ensures optimal reducts Richard Jensen, Andrew Tuson and Qiang Shen
Fuzzy discernibility matrices • Extension of crisp approach • Previously, attributes had {0, 1} membership to clauses • Now have membership in [0, 1] • Allows real-coded data as well as nominal. • Fuzzy DMs can be used to find fuzzy-rough reducts Richard Jensen, Andrew Tuson and Qiang Shen
Formulation • Fuzzy satisfiability • In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true • For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: setup • 9 benchmark datasets • Features – 10 to 39 • Objects – 120 to 690 • Methods used: • FRFS-SAT • Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and fuzzy discernibility. • Evolutionary algorithms: genetic algorithms (GA) and particle swarm optimization (PSO) using fuzzy dependency • 10 x 10 -fold cross validation • FS performed on the training folds, test folds reduced using discovered reducts Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: results Richard Jensen, Andrew Tuson and Qiang Shen
Conclusion • Extended propositional satisfiability to enable search for fuzzy-rough reducts • New framework for fuzzy satisfiability • New DPLL algorithm • Fuzzy clause simplification • Future work: • • Non-chronological backtracking Better heuristics Unsupervised FS Other extensions in propositional satisfiability Richard Jensen, Andrew Tuson and Qiang Shen
• WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from: http: //users. aber. ac. uk/rkj/book/weka. zip Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection • Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) Feature set Generation Subset Evaluation Subset suitability Continue • • • Stopping Criterion Stop Validation Subset generation: generation forwards, backwards, random… Evaluation function: function determines ‘goodness’ of subsets Stopping criterion: criterion decide when to stop subset search Richard Jensen, Andrew Tuson and Qiang Shen
Algorithm Richard Jensen, Andrew Tuson and Qiang Shen
Example Richard Jensen, Andrew Tuson and Qiang Shen
- Slides: 16