Introduction to Evolutionary Algorithms Lecture 2 Jim Smith

  • Slides: 31
Download presentation
Introduction to Evolutionary Algorithms Lecture 2 Jim Smith University of the West of England,

Introduction to Evolutionary Algorithms Lecture 2 Jim Smith University of the West of England, UK May/June 2012

Overview l l l Recap of EC metaphor Recap of basic behaviour Role of

Overview l l l Recap of EC metaphor Recap of basic behaviour Role of fitness function Dealing with constraints Representation as key to problem solving – – 2 Integer Representations Permutation Representations Continuous Representations Tree-based Representations

Recap of EC metaphor l l l 3 A population of individuals exists in

Recap of EC metaphor l l l 3 A population of individuals exists in an environment with limited resources Competition for those resources causes selection of those fitter individuals that are better adapted to the environment These individuals act as seeds for the generation of new individuals through recombination and mutation The new individuals have their fitness evaluated and compete (possibly also with parents) for survival. Over time Natural selection causes a rise in the fitness of the population

General Scheme of EAs 4

General Scheme of EAs 4

Typical behaviour of an EA Phases in optimising on a 1 -dimensional fitness landscape

Typical behaviour of an EA Phases in optimising on a 1 -dimensional fitness landscape Early phase: quasi-random population distribution Mid-phase: population arranged around/on hills Late phase: population concentrated on high hills

Best fitness in population Typical run: progression of fitness Time (number of generations) Typical

Best fitness in population Typical run: progression of fitness Time (number of generations) Typical run of an EA shows so-called “anytime behavior”

Best fitness in population Are long runs beneficial? Progress in 2 nd half Progress

Best fitness in population Are long runs beneficial? Progress in 2 nd half Progress in 1 st half Time (number of generations) • Answer: - how much do you want the last bit of progress? - it may be better to do more shorter runs

Evolutionary Algorithms in Context l l There are many views on the use of

Evolutionary Algorithms in Context l l There are many views on the use of EAs as robust problem solving tools. For most problems a problem-specific tool may: – – – l perform better than a generic search algorithm on most instances, have limited utility, not do well on all instances Goal is to provide robust tools that provide: – – evenly good performance over a range of problems and instances

What are the different types of EAs l Historically different flavours of EAs have

What are the different types of EAs l Historically different flavours of EAs have been associated with different representations – – l These differences are largely irrelevant, best strategy – – l 9 Binary strings : Genetic Algorithms Real-valued vectors : Evolution Strategies Finite state Machines: Evolutionary Programming LISP trees: Genetic Programming choose representation to suit problem choose variation operators to suit representation Selection operators only use fitness and so are independent of representation

Role of fitness function l l 10 The more fitness levels you have available,

Role of fitness function l l 10 The more fitness levels you have available, the more information is potentially available to guide search EAs can cope with fitness functions that are: – Noisy, – Time dependant, – Discontinuous – and have Multiple optima,

Problems with Constraints l “Constrained Optimisation Problems” – – – l Constraint Satisfaction Problems

Problems with Constraints l “Constrained Optimisation Problems” – – – l Constraint Satisfaction Problems – – – 11 Some problems inherently have constraints as well a fitness functions Can incorporate into fitness functions (indirect) Can also incorporate into representation (direct) Seek solution which meets set of constraints Transform to COP by minimising constraints (indirect method), Might be able to use good representations (direct)

Feasible & Unfeasible Regions l Space will be split into two disjoint sets of

Feasible & Unfeasible Regions l Space will be split into two disjoint sets of spaces: – F (the feasible regions) –may be connected – U (the unfeasible regions). S F X n s U

Methods for constraint handling Direct Indirect

Methods for constraint handling Direct Indirect

Direct vs Indirect Handling Pros Indirect (Penalty Functions) Direct Conceptually simple, transparent it works

Direct vs Indirect Handling Pros Indirect (Penalty Functions) Direct Conceptually simple, transparent it works well reduces problem to ‘simple’ optimization except simply eliminating all infeasible solutions allows user to tune to his/her preferences by weights allows EA to tune fitness function by modifying weights during the search problem independent Cons 14 loss of info by packing everything in a single number problem specific said not to work well for sparse problems no guidelines

Example: N Queens l Place N queens on a chess board so they cannot

Example: N Queens l Place N queens on a chess board so they cannot take each other 64*63*62*61*60 *59*58*56 solutions for N=8 =64!/ (56! * 8!) = 4. 4 * 109 15

Designing an EA l Fitness function: N – num_vulnerable_queens – l Population and Selection?

Designing an EA l Fitness function: N – num_vulnerable_queens – l Population and Selection? – – l 16 Transforms CSP to COP Whatever we like, e. g: Population size 100, tournament selection of 2 parents Replace two least fit from population if better Representation?

Possible Representations l Method 1: Based on the board – – – l Based

Possible Representations l Method 1: Based on the board – – – l Based on the pieces – – 17 64 -bit Binary string: 0/1 empty /occupied 264 possibilities –more than problem! Introduces extra constraint that only 8 cells occupied Repair function or specialised operators? Or fractional penalty function More natural and problem focussed Avoids extra constraint

Operators for binary representations Binary l Recombination One point, N-point – Uniform – l

Operators for binary representations Binary l Recombination One point, N-point – Uniform – l Randomly choose parent 1 or 2 for each gene l Mutation 18 – Independent flip 0 1 for each gene

Integer Representations l l Label cells 1 -64 Method 2: one gene per piece

Integer Representations l l Label cells 1 -64 Method 2: one gene per piece encodes cell – – – l Could make think about constraints – – – 19 64 N = for 1. 7 x 1014 for N=8 (potential duplicates) 1 pt crossover, extended random mutation Ok, but huge space with only 9 fitness levels Rows, columns, diagonals Indirect – penalise all Direct – can we avoid some?

Integer representations l l Some problems naturally have integer variables, e. g. image processing

Integer representations l l Some problems naturally have integer variables, e. g. image processing parameters Others take categorical values from a fixed set e. g. {blue, green, yellow, pink} N-point / uniform crossover operators work Extend bit-flipping mutation to make – – – 20 “creep” i. e. more likely to move to similar value Random choice (esp. categorical variables) For ordinal problems, it is hard to know correct range for creep, so often use two mutation operators in tandem

Partially direct representations l Method 3 – – l Method 4 – – –

Partially direct representations l Method 3 – – l Method 4 – – – 21 Row constraint <=> each queen on different row Let value off gene I = column of queen in row I Solution space size 8 N = 1. 67 x 107 One point crossover, extended randomised mutation As above but also meet column constraints Permutation: N! = 40320 Now need specialised crossover and mutation

Permutation Representations l l Ordering/sequencing problems form a special type. Solution= arrangement objects in

Permutation Representations l l Ordering/sequencing problems form a special type. Solution= arrangement objects in a certain order. – – l These problems are generally expressed as a permutation: – 22 Example: sort algorithm: important thing is which elements occur before others (order), Example: Travelling Salesman Problem (TSP) : important thing is which elements occur next to each other (adjacency), if there are n variables then the representation is as a list of n integers, each of which occurs exactly once

Variation operators for permutations l Normal mutation operators don’t work: – – l l

Variation operators for permutations l Normal mutation operators don’t work: – – l l l 23 e. g. bit-wise mutation : let gene i have value j changing to some other value k would mean that k occurred twice and j no longer occurred Therefore must change at least two values Various mechanisms exist (swap, invert, . . . ). Similar arguments mean specialised crossovers are needed.

Example mutation operators 24

Example mutation operators 24

Crossover operators for permutations l “Normal” crossover operators will often lead to inadmissible solutions

Crossover operators for permutations l “Normal” crossover operators will often lead to inadmissible solutions l 12345 12321 54345 Many specialised operators have been devised which focus on combining order or adjacency information from the two parents

Another Example of Binary Encoding: Feature Selection for Machine Learning l Many successful Machine

Another Example of Binary Encoding: Feature Selection for Machine Learning l Many successful Machine Learning / Data Mining algorithms use greedy search: Decision trees add most informative nodes – Rule Induction: add most useful next rule – Bayesian networks: to identify co-related features Distance-based methods measure difference along each axis All these can be improved by using global search in the feature selection process Use a binary coded GA: 0/1 : use/don’t use feature – l l l – – 26 M. Tahir and J. E. Smith. Creating Diverse Nearest Neighbour Ensembles using Simultaneous Metaheuristic Feature Selection. 2010. Pattern Recognition Letters, 31(11): 1470 --1480. Smith, M. & Bull, L. (2005) Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genetic Programming and Evolvable Machines 6(3): 265 -281.

Schemata for Feature Selection Binary string representing choice of features EA Machine Learning Algorithm

Schemata for Feature Selection Binary string representing choice of features EA Machine Learning Algorithm builds and evaluates model on reduced data Fitness = accuracy 27 Full Data Set Reduced Data Set

Example of Integer Encoding l Protein Structure Prediction: – – – l Proteins are

Example of Integer Encoding l Protein Structure Prediction: – – – l Proteins are created as strings of amino acid “residues” Behaviour of a protein is determined by its 3 -D structure Proteins naturally “fold” to lowest energy structure Model as a fixed-length path through a 3 D grid – Representation: sequence of “up/down/L/R/forward” to specify a path – Fitness based on pairwise interactions between residues l l 28 N. Krasnogor and W. Hart and J. E. Smith and D. Pelta. Protein Structure Prediction With Evolutionary Algorithms. Proc. GECCO 1999 , pages 1596 --1601. Morgan Kaufmann. R. Santana, P. Larrañaga, and J. A. Lozano. Protein folding in simplified models with estimation of distribution algorithms. IEEE Transactions on Evolutionary Computation. Vol. 12. No. 4. Pp. 418 -438.

Example of 2 D HP model Dark boxes represent hydrophobic residues (H): H-H contacts

Example of 2 D HP model Dark boxes represent hydrophobic residues (H): H-H contacts add +1 to fitness 29

Example 2: Microprocessor Design Verification l Need to “drive” processor into a variety of

Example 2: Microprocessor Design Verification l Need to “drive” processor into a variety of states – to make sure it does the right thing in each. – Test = sequence of assembly code instructions Traditional methods generate millions of random tests, weren’t reaching all states – l UWE solution: evolve sequences of tests – – – 30 Integer encoding (fixed number of instructions) Specialised mutation: group instruction in classes, l more likely to move to similar type of instruction J. E. Smith and M. Bartley and T. C. Fogarty. Microprocessor Design Verification by Two-Phase Evolution of Variable Length Tests. Proc. 1997 IEEE Conference on Evolutionary Computation, pages 453 --458. IEEE Press

Summary l Fitness function should provide as much information as possible – – l

Summary l Fitness function should provide as much information as possible – – l Representation should suit the problem – – – 31 Could penalise infeasible solutions Selection / Population management is independent of representation Can take constraints into account (direct) Recombination/Mutation defined by representation Could be problem specific (direct constraint handling)