CENG 562 MACHINE LEARNING GENETIC ALGORITHMS Presented by
CENG 562 MACHINE LEARNING GENETIC ALGORITHMS Presented by Abdurrahman Çarkacıoğlu
OUTLINE • • What is GA? Biological Basis Terminology Operations – – Fitness Function Reproduction Crossover Mutation • Genetic Programming • Traditional Methods vs. GA. • Applications and Research Topics • Summary & Conclusion
WHAT IS GA ? . • Search algorithms based on – mechanics of natural selection – natural genetics • Robust, flexible, efficient in vast complex spaces. • Use the idea of randomness search but not directionless.
BIOLOGICAL BASIS OF GA • Chromosomes – Heredity aspects of the individual • Genes – Encodes a specific feature of the individual. – Actual value allele • Mating – Both parent pass their chromosomes onto their offsprings. • The weaker ones tend to die • Stronger ones survive and produce new offsprings. • Rarely mutation occurs
GA’s TERMINOLOGY • String Structures Chromosomes (Hypothesis) • Elements of the String Gene's (Allele) • Hypothesis/String Structures – Composed of features, or detectors.
REPRESENTATION of HYPOTHESIS • Bit Strings (most of the time) • Examples: Outlook={Sunny, Overcast, Rain} 001 encodes Outlook=Rain 011 encodes Outlook=Overcast or Rain 111 encodes Outlook=Sunny or Overcast or Rain
Rule Representation Example • Rule : Whatever the outlook, if wind=strong then Play. Tennis. Given: Wind {Strong, Weak} Play. Tennis {Yes, No} Solution: Outlook Wind Play. Tennis 111 10 10
BASIC GA OPERATIONS · Inner workings of GA is simple. · Based on · simple string copying · substracting concetation · nothing more, nothing less.
FITNESS FUNCTION · Returns a single numerical fitness value. · Problem dependent · For a function optimization search, it is simply the value of the function. • For learning classification rules, it is classification accuracy over a set of training examples
REPRODUCTION · Allows individual strings to be copied for possible inclusion in the next generation. · Copying chance is based on the fitness value. String Ftn 01001 5 10000 12 01110 9 Perc. 19% 46% 35% · 46% of the time 10000 is selected.
CROSSOVER • Produce 2 new offspring from 2 parent string. Initial Strings Offsprings
MUTATION • Applied after crossover • Randomly alters each gene with a small probability, typically 0, 001. Point Mutation Example Initial String After Mutation 11101101 11111101
Pseudo Code for GA • choose initial population at random • evaluate each individual's fitness • repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness • until terminating condition
RECENT WORKS · Using real numbers instead of bit strings · Use of knowledge-augmented operators for GAs. · Strings may be variable lengths. genetic
GENETIC PROGRAMMING • Branch of genetic algorithms • Difference is the representation of the solution • Creates computer programs in the Lisp or Scheme computer languages as the solution.
Pseudo Code for GP 1)Create an initial population of random composition of the functions and terminals of the problem. 2) Execute each program, and assign it a fitness value according to the how well it solves a problem. 3)Create new population of computer programs. • )Copy the best existing programs. • )Create new programs by mutation • )Create new computer programs by crossover 4) Announce the best computer program when predefined conditions are satisfied.
GA vs. GP · GP is much more powerfull than GA. · The output of the GA is a quantity, while output of GP is a another computer program. · Computer programs that program themselves. · Where there is no ideal solution, GP works best (ex. A program that drives a car).
Traditional Methods vs. GAs –. GA a coded form of the function values, rather than with the actual values themselves. –. GA use a set, or population not just a single point on the problem space. –. GA use only payoff information to guide themselves. –. Probabilistic, not deterministic. –. Inherently parallel.
A Brief Survey of GA Technology • • • . Criminal suspect recognition. Music Composition. Constructing and Training Neural Networks. Scheduling Problems. Games Playing. Structural Optimization. Function Optimization. Database Query Optimization Aircraft Design
GABIL by De. Jong (1993) · Uses GA to learn boolean concepts, represented by disjunctive set of propositional rules. · Roughly comparable performance among C 4. 5, ID 5 R and AQ 14 systems. · Over 12 synthetic problems Average generalization accuracy for GABIL Others 92. 1% 91. 6% to 96. 6%
Summary & Conclusions · conduct a randomized, parallel, hill-climbing search for hypothesis that optimize a predefined fitness function. · operate on populations of strings/hypothesis/chromosomes. · string represents some underlying parameter set. · reproduction, crossover and mutation are used to create new populations.
- Slides: 21