GENETIC ALGORITHMS AND GENETIC PROGRAMMING John R Koza

























![CRITERION FOR SUCCESS "The aim [is]. . . to get machines to exhibit behavior, CRITERION FOR SUCCESS "The aim [is]. . . to get machines to exhibit behavior,](https://slidetodoc.com/presentation_image_h2/995c9cc1f2de0436100e86a1caae5b37/image-26.jpg)
























































- Slides: 82
GENETIC ALGORITHMS AND GENETIC PROGRAMMING
John R. Koza Consulting Professor (Medical Informatics) Department of Medicine School of Medicine Consulting Professor Department of Electrical Engineering School of Engineering Stanford University Stanford, California 94305 koza@stanford. edu http: //www. smi. stanford. edu/people/koza/
DEFINITION OF THE GENETIC ALGORITHM (GA) The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
GENETIC ALGORITHM (GA) Generation 0 Generation 1 Individuals Fitness Offspring 011 $3 111 001 $1 010 110 $6 110 010 $2 010
HAMBURGER RESTAURANT PROBLEM • Price 1 = $ 0. 50 price 0 = $10. 00 price • Drink 1 = Coca Cola 0 = Wine • Ambiance 1 = Fast snappy service 0 = Leisurely service with tuxedoed waiter
CHROMOSOME (GENOME) OF THE GLOBAL OPTIMUM Mc. DONALD's 1 1 1
THE SEARCH SPACE 1 2 3 4 5 6 7 8 000 001 010 011 100 101 110 111 • Alphabet size K=2, Length L=3 • Size of search space: KL=2 L=23=8
IMPRACTICALITY OF RANDOM OR ENUMERATIVE SEARCH • 81 -bit problems are very small for GA • However, even if L is as small as 81, 281 ~ 1027 = number of nanoseconds since the beginning of the universe 15 billion years ago
GA FLOWCHART
GENERATION 0 Generation 0 011 3 001 1 110 6 010 2 1 2 3 4 Total Worst Average Best
DEFINITION OF THE GENETIC ALGORITHM (GA) The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
PROBABILISTIC SELECTION BASED ON FITNESS • • • Better individuals are preferred Best is not always picked Worst is not necessarily excluded Nothing is guaranteed Mixture of greedy exploitation and adventurous exploration • Similarities to simulated annealing (SA)
DARWINIAN FITNESS PROPORTIONATE SELECTION Generation 0 1 011 3. 25 2 001 1. 08 3 110 6. 50 4 010 2. 17 Total 12 Worst 1 Average 3. 00 Best 6 Mating pool 011 3 110 6 010 2 17 2 4. 5 6
DEFINITION OF THE GENETIC ALGORITHM (GA) The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
MUTATION OPERATION • Parent chosen probabilistically based on fitness Parent 010 • Mutation point chosen at random Parent --0 • One offspring Offspring 011
AFTER MUTATION OPERATION Generation 0 1 011 3. 25 2 001 1. 08 3 110 6. 50 4 010 2. 17 Total 12 Worst 1 Average 3. 00 Best 6 Mating pool Generation 1 011 3 110 6 010 2 --- 011 3 17 2 4. 5 6
CROSSOVER OPERATION • 2 parents chosen probabilistically based on fitness P a r e n t 1 P a r e n t 2
CROSSOVER (CONTINUED) • Interstitial point picked at random Fragment 1 01 - Fragment 2 11 - • 2 remainders Remainder 1 Remainder 2 --1 --0 • 2 offspring produced by crossover Offspring 1 111 Offspring 2 010
AFTER CROSSOVER OPERATION Generation 0 1 011 3. 25 2 001 1. 08 3 110 6. 50 4 010 2. 17 Total 12 Worst 1 Average 3. 00 Best 6 Mating pool Generation 1 011 3 2 111 7 110 6 2 010 2 110 6 010 2 17 2 4. 5 6
AFTER REPRODUCTION OPERATION Generation 0 Mating pool Generation 1 1 011 3. 25 2 001 1. 08 3 110 6. 50 110 6 --- 110 6 4 010 2. 17 Total 12 17 Worst 1 2 Average 3. 00 4. 5 Best 6 6
DEFINITION OF THE GENETIC ALGORITHM (GA) The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
GENERATION 1 Generation 0 1 011 3. 25 2 001 1. 08 3 110 6. 50 4 010 2. 17 Total 12 Worst 1 Average 3. 00 Best 6 Mating pool 011 3 110 6 010 2 17 2 4. 5 6 Generation 1 2 111 7 2 010 2 --- 110 6 --- 011 3 18 2 4. 5 7
PROBABILISTIC STEPS • The initial population is typically random • Probabilistic selection based on fitness - Best is not always picked - Worst is not necessarily excluded • Random picking of mutation and crossover points • Often, there is probabilistic scenario as part of the fitness measure
GENETIC PROGRAMMING
THE CHALLENGE "How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it? " Attributed to Arthur Samuel (1959)
CRITERION FOR SUCCESS "The aim [is]. . . to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence. “ Arthur Samuel (1983)
REPRESENTATIONS • Decision trees • If-then production rules • Horn clauses • Neural nets • Bayesian networks • Frames • Propositional logic • Binary decision diagrams • Formal grammars • Coefficients for polynomials • Reinforcement learning tables • Conceptual clusters • Classifier systems
A COMPUTER PROGRAM
GENETIC PROGRAMMING (GP) • GP applies the approach of the genetic algorithm to the space of possible computer programs • Computer programs are the lingua franca for expressing the solutions to a wide variety of problems • A wide variety of seemingly different problems from many different fields can be reformulated as a search for a computer program to solve the problem.
GP MAIN POINTS • Genetic programming now routinely delivers high-return human-competitive machine intelligence. • Genetic programming is an automated invention machine. • Genetic programming has delivered a progression of qualitatively more substantial results in synchrony with five approximately order-of-magnitude increases in the expenditure of computer time.
GP FLOWCHART
A COMPUTER PROGRAM IN C int foo (int time) { int temp 1, temp 2; if (time > 10) temp 1 = 3; else temp 1 = 4; temp 2 = temp 1 + 2; return (temp 2); }
OUTPUT OF C PROGRAM Time Output 0 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 7 12 7
PROGRAM TREE (+ 1 2 (IF (> TIME 10) 3 4))
CREATING RANDOM PROGRAMS
CREATING RANDOM PROGRAMS • Available functions F = {+, -, *, %, IFLTE} • Available terminals T = {X, Y, Random-Constants} • The random programs are: – Of different sizes and shapes – Syntactically valid – Executable
GP GENETIC OPERATIONS • • Reproduction Mutation Crossover (sexual recombination) Architecture-altering operations
MUTATION OPERATION
MUTATION OPERATION • • Select 1 parent probabilistically based on fitness Pick point from 1 to NUMBER-OF-POINTS Delete subtree at the picked point Grow new subtree at the mutation point in same way as generated trees for initial random population (generation 0) • The result is a syntactically valid executable program • Put the offspring into the next generation of the population
CROSSOVER OPERATION
CROSSOVER OPERATION • Select 2 parents probabilistically based on fitness • Randomly pick a number from 1 to NUMBER-OFPOINTS for 1 st parent • Independently randomly pick a number for 2 nd parent • The result is a syntactically valid executable program • Put the offspring into the next generation of the population • Identify the subtrees rooted at the two picked points
REPRODUCTION OPERATION • Select parent probabilistically based on fitness • Copy it (unchanged) into the next generation of the population
FIVE MAJOR PREPARATORY STEPS FOR GP • • • Determining the set of terminals Determining the set of functions Determining the fitness measure Determining the parameters for the run Determining the method for designating a result and the criterion for terminating a run
PREPARATORY STEPS Objective: Find a computer program with one input (independent variable X) whose output equals the given data 1 Terminal set: T = {X, Random-Constants} 2 Function set: F = {+, 3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from – 1. 0 to +1. 0) 4 Parameters: Population size M = 4 5 Termination: An individual emerges whose sum of absolute errors is less than 0. 1 -, *, %}
SYMBOLIC REGRESSION POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION x 2 + x + 1 FITNESS OF THE 4 INDIVIDUALS IN GEN 0 x+1 x 2 + 1 2 x 0. 67 1. 00 1. 70 2. 67
SYMBOLIC REGRESSION x 2 + x + 1 GENERATION 1 Mutant of (c) Copy of (a) picking “ 2” as mutation point First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
WALL-FOLLOWER
FITNESS
BEST OF GENERATION 57
SUBROUTINE DUPLICATION
SUBROUTINE CREATION
SUBROUTINE DELETION
ARGUMENT DUPLICATION
ARGUMENT DELETION
16 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY CREATING COMPUTER PROGRAMS • Starts with "What needs to be done" • Tells us "How to do it" • Produces a computer program • Automatic determination of program size • Code reuse • Parameterized reuse • Internal storage • Iterations, loops, and recursions • Self-organization of hierarchies • Automatic determination of program architecture • Wide range of programming constructs • Well-defined • Problem-independent • Wide applicability • Scalable • Competitive with humanproduced results
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP • • • Toy problems Human-competitive non-patent results 20 th-century patented inventions 21 st-century patented inventions Patentable new inventions
GP AS AN INVENTION MACHINE
NASA EVOLVED ANTENNA To be on satellite to be launched in 2004
CHARACTERISTICS SUGGESTING USE OF GP (1) discovering the size and shape of the solution, (2) reusing substructures, (3) discovering the number of substructures, (4) discovering the nature of the hierarchical references among substructures, (5) passing parameters to a substructure, (6) discovering the type of substructures (e. g. , subroutines, iterations, loops, recursions, or storage), (7) discovering the number of arguments possessed by a substructure, (8) maintaining syntactic validity and locality by means of a developmental process, or (9) discovering a general solution in the form of a parameterized topology containing free variables
DESIGNING A GIRAFFE • • • Long neck Long tongue Vegetable-digesting enzymes in stomach 4 legs Long legs Brown coloration
THE DESIGN OF A GOOD GIRAFE Neck length Tongue length Carnivorous? Number Leg of legs length 15. 11 feet 14 inches No 4 9. 96 feet Brown Floating point Integer Floating Categorical point Boolean Coloration
NON-LINEARITY — GIRAFE • Taken one-by-one, some gene values found in a giraffe, such as the long neck contribute (alone) negatively to fitness – requires considerable material to construct – requires considerable energy to maintain – prone to injury (thereby hurting rate of survival and reproduction) • Thus, maximizing any one variable will not lead to the global optimum solution
NON-LINEARITY (CONTINUED) • When the variables are taken in pairs (there are 15 possible pairs), many combinations of pairs (e. g. , Long neck and long tongue) are doubly detrimental
NON-LINEARITY (CONTINUED) • But, certain combinations of traits, when taken together, are "co-adapted sets of alleles" that yield a very fit animal for eating high acacia leaves in the jungle environment, having good camouflage, having high escape velocity when faced with predators, and exploiting a niche (and avoiding competition) with other animals feeding on low-hanging vegetation
SEARCH METHODS IN GENERAL • • • Initial structure(s) Fitness measure Operations for creating new structures Parameters Termination criterion and method of designating the result
SPACE WITH MANY LOCAL OPTIMA
SEARCH METHODS • Blind random search does not use acquired information in deciding on the future direction of the search • Hill combing and gradient descent use acquired information; however, they are prone to becoming trapped on local optima • The previous point is especially true for nontrivial search spaces
7 DIFFERENCES BETWEEN GP AND ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APPROACHES
REPRESENTATION Genetic programming overtly conducts it search for a solution to the given problem in program space
ROLE OF POINT-TO-POINT TRANSFORMATIONS IN THE SEARCH Genetic programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points
ROLE OF HILL CLIMBING IN THE SEARCH Genetic programming does not rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are known to be inferior
DETERMINISM IN THE SEARCH Genetic programming conducts its search probabilistically
ROLE OF AN EXPLICIT KNOWLEDGE BASE Genetic programming does NOT make use of a knowledge base
ROLE OF FORMAL LOGIC IN THE SEARCH Genetic programming does not utilize formal logic in it’s search strategy. Contradictory alternatives are created and actively maintained.
UNDERPINNINGS OF THE TECHNIQUE Biologically inspired
TURING (1948) Turing made the connection between searches and the challenge of getting a computer to solve a problem without explicitly programming it in his 1948 essay “Intelligent Machines” "Further research into intelligence of machinery will probably be very greatly concerned with 'searches'. . . “
TURING’S 3 APPROACHES TO MACHINE INTELLIGENCE (1948) LOGIC-BASED SEARCH One approach that Turing identified is a search through the space of integers representing candidate computer programs.
TURING’S 3 APPROACHES (CONTINUED) CULTURAL SEARCH A second approach is the "cultural search“ which relies on knowledge and expertise acquired over a period of years from others (akin to present-day knowledgebased systems).
TURING’S 3 APPROACHES (CONTINUED) GENETICAL OR EVOLUTIONARY SEARCH "There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value. “
TURING (1950) From Turing’s 1950 paper "Computing Machinery and Intelligence" … “We cannot expect to find a good childmachine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications”
TURING (1950) (CONTINUED) “Structure of the child machine = Hereditary material “Changes of the child machine = Mutations “Natural selection = of the experimenter” Judgment