Local optimization technique G Anuradha Introduction The evaluation

Introduction The evaluation function defines a quality measure score landscape/response surface/fitness landscape

Hill Climbing • start at randomly generated state • move to the neighbour with

Flowchart of Hill climbing Select a current solution s Evaluate s Select a new

Stopping condition • Either the whole neighborhood has been searched • Or we have

Features of hill climbing techniques • Provides local optimum values that depends on starting

Weakness of hill climbing algorithm • Termination on local optimum values • There is

Stochastic Hill Climber • The problem of getting struck up in the local optima

Stochastic Hill climbing approach Select a current solution s Evaluate s Select a new

How this probabilistic function works? • There are 3 cases – 50% probable: if

Effect of parameter T • If the new solution x is superior, then the

How this probabilistic function works? Contd… • The probability of accepting a new solution

Annealing • Heating steel at a suitable temperature, followed by relatively slow cooling. •

Simulated Annealing Set the initial temperature T n Is T low? STOP Select a

Analogy between both Annealing Simulated Annealing State Feasible solution Energy Evaluation function Ground state

• SA resembles a random search at higher temperatures and classic hill climber

Tabu Search • Meta-heuristics search algorithm that guides a local heuristic search procedure to

Flowchart of tabu search Set the initial memory M Select a current solution s

Memory component of tabu search • There are 3 ways of computing memory –

Evolution • Here’s a very oversimplified description of how evolution works in biology •

Genotype and Phenotype • Genes are the basic “instructions” for building an organism •

The basic genetic algorithm • Start with a large “population” of randomly generated “attempted

Flowchart of evolution algorithm Create initial population A Initialize counter t=0 Evaluate all s

A really simple example • Suppose your “organisms” are 32 -bit computer words •

A more realistic example, part I • Suppose you have a large number of

A more realistic example, part II • • Your formula is y = ax

A more realistic example, part III • Your algorithm might be as follows: –

The really simple example again • Suppose your “organisms” are 32 -bit computer words,

Asexual vs. sexual reproduction • In the examples so far, – Each “organism” (or

Crossover and mutation operators Mutation Crossover

Crossover • Crossover is a genetic operator that combines (mates) two chromosomes (parents) to

Types of crossover • One point crossover 0110 1001 0100 1110 1010 1101 1011

Arithmetic crossover • Offspring 1 = a * Parent 1 + (1 - a)

Comparison of simple examples • In the simple example (trying to get all 1

Slides: 39

Download presentation

Local optimization technique G. Anuradha

Introduction The evaluation function defines a quality measure score landscape/response surface/fitness landscape

Hill Climbing • start at randomly generated state • move to the neighbour with the best evaluation value • if a strict local-minimum is reached then restart at other randomly generated state.

Flowchart of Hill climbing Select a current solution s Evaluate s Select a new solution x from the neighborhood of s Evaluate x Select x as new current solution s yes Is x better than s? no

Stopping condition • Either the whole neighborhood has been searched • Or we have exceeded the threshold of allowed attempts • The last solution is the best solution or the current solution is stored and the same procedure is repeated again(iterated hill climbing)

Features of hill climbing techniques • Provides local optimum values that depends on starting solution • Can’t be used for finding the global optimum because there is no general procedure for measuring the relative error with respect to global optimum • The success of the algorithm is depended on the initial value choosen

Weakness of hill climbing algorithm • Termination on local optimum values • There is no indication of how much the local optimum deviates from global optimum • The optimum value obtained depends on the initial configurations • An upper bound of computation time can’t be provided

Stochastic Hill Climber • The problem of getting struck up in the local optima is eliminated to a certain extend in this approach • In this approach new solutions having negative change in the quality measure score is also accepted. • Some basic changes to the ordinary hill climbing is made in stochastic hill climbing approach

Flowchart of Hill climbing Select a current solution s Evaluate s Select a new solution x from the neighborhood of s Evaluate x Select x as new current solution s yes Is x better than s? no

Stochastic Hill climbing approach Select a current solution s Evaluate s Select a new solution x from the neighborhood of s Evaluate x Select x as a new current solution s with probability P

How this probabilistic function works? • There are 3 cases – 50% probable: if the new solution x has the same quality measure score as the current solution s – >50% probable: if the new solution x is superior then the probability of acceptance is greater than 50% – <50% : if the new solution is inferior, then the probability of acceptance is smaller than 50%

Effect of parameter T • If the new solution x is superior, then the probability of acceptance is closer to 50% for high values of T, or closer to 100% for low values of T • If the new solution x is inferior, then the probability of acceptance is closer to 50% for high values of T or closer to 0% for low values of T

How this probabilistic function works? Contd… • The probability of accepting a new solution x also depends on the value of parameter T( T remains constant during the execution of the algorithm) • Superior solution x would have a probability of acceptance of atleast 50%(irrespective of T) • Inferior solution x have a probability of acceptance of at most 50% (0 – 50%) • T is neither too low nor too high for a particular problem • Forerunner of simulated annealing

Annealing • Heating steel at a suitable temperature, followed by relatively slow cooling. • The purpose of annealing may be to remove stresses, to soften the steel, to improve machinability, to improve cold working properties, to obtain a desired structure. • The annealing process usually involves allowing the steel to cool slowly in the furnace.

Simulated Annealing Set the initial temperature T n Is T low? STOP Select a current solution s Evaluate s K=0 K=K+1 Decrease T y Is K large enough? Select a new solution x in the neighborhood of s n Evaluate x Select x as a new current solution s y X better than s n y Select x as a new current solution s with probability p

Analogy between both Annealing Simulated Annealing State Feasible solution Energy Evaluation function Ground state Optimal solution Rapid quenching Local search Careful annealing Simulated annealing

• SA resembles a random search at higher temperatures and classic hill climber at lower temperatures • When applied to a specific applications some questions come in mind? – What is the representation? – How are neighbors defined? – What is the evaluation function? – How to determine how big is k? – How to cool the system or how to decrease the temperature? – How to determine the stopping condition?

Tabu Search • Meta-heuristics search algorithm that guides a local heuristic search procedure to search beyond local optimality • Uses adaptive memory and responsive exploration to explore the search space • Its deterministic in nature, but its possible to add some probabilistic elements to it

Flowchart of tabu search Set the initial memory M Select a current solution s Evaluate S Select a no. of solutions x, y, . . From neighbourhood of s Evaluate x, y, z…… Select one solution x as new solution s, the decision based on quality measure score and M Update M

Memory component of tabu search • There are 3 ways of computing memory – Recency based memory: - Memory structure gets updated after certain iterations and records the last few iterations – Frequency based memory: the memory structure works for a longer time horizon and measures the frequency of change at each position

Evolution • Here’s a very oversimplified description of how evolution works in biology • Organisms (animals or plants) produce a number of offspring which are almost, but not entirely, like themselves – Variation may be due to mutation (random changes) – Variation may be due to sexual reproduction (offspring have some characteristics from each parent) • Some of these offspring may survive to produce offspring of their own—some won’t – The “better adapted” offspring are more likely to survive – Over time, later generations become better and better adapted • Genetic algorithms use this same process to “evolve” better programs 21

Evolutionary Algorithms

Genotype and Phenotype • Genes are the basic “instructions” for building an organism • A chromosome is a sequence of genes • Biologists distinguish between an organism’s genotype (the genes and chromosomes) and its phenotype (what the organism actually is like) • Example: You might have genes to be tall, but never grow to be tall for other reasons (such as poor diet) • Similarly, “genes” may describe a possible solution to a problem, without actually being the solution

The basic genetic algorithm • Start with a large “population” of randomly generated “attempted solutions” to a problem • Repeatedly do the following: – Evaluate each of the attempted solutions – Keep a subset of these solutions (the “best” ones) – Use these solutions to generate a new population • Quit when you have a satisfactory solution (or you run out of time) 25

Flowchart of evolution algorithm Create initial population A Initialize counter t=0 Evaluate all s from A Select a set of parents from A Create a set of offspring Create a new population A from existing parents and offspring t=t+1 Is t large no yes STOP

A really simple example • Suppose your “organisms” are 32 -bit computer words • You want a string in which all the bits are ones • Here’s how you can do it: – Create 100 randomly generated computer words – Repeatedly do the following: • Count the 1 bits in each word • Exit if any of the words have all 32 bits set to 1 • Keep the ten words that have the most 1 s (discard the rest) • From each word, generate 9 new words as follows: – Pick a random bit in the word and toggle (change) it • Note that this procedure does not guarantee that the next “generation” will have more 1 bits, but it’s likely 27

A more realistic example, part I • Suppose you have a large number of (x, y) data points – For example, (1. 0, 4. 1), (3. 1, 9. 5), (-5. 2, 8. 6), . . . • You would like to fit a polynomial (of up to degree 5) through these data points – That is, you want a formula y = ax 5 + bx 4 + cx 3 + dx 2 +ex + f that gives you a reasonably good fit to the actual data – Here’s the usual way to compute goodness of fit: • Compute the sum of (actual y – predicted y)2 for all the data points • The lowest sum represents the best fit • There are some standard curve fitting techniques, but let’s assume you don’t know about them • You can use a genetic algorithm to find a “pretty good” solution 28

A more realistic example, part II • • Your formula is y = ax 5 + bx 4 + cx 3 + dx 2 +ex + f Your “genes” are a, b, c, d, e, and f Your “chromosome” is the array [a, b, c, d, e, f] Your evaluation function for one array is: – For every actual data point (x, y), (I’m using red to mean “actual data”) • Compute ý = ax 5 + bx 4 + cx 3 + dx 2 +ex + f • Find the sum of (y – ý)2 over all x • The sum is your measure of “badness” (larger numbers are worse) – Example: For [0, 0, 0, 2, 3, 5] and the data points (1, 12) and (2, 22): • • 29 ý = 0 x 5 + 0 x 4 + 0 x 3 + 2 x 2 +3 x + 5 is 2 + 3 + 5 = 10 when x is 1 ý = 0 x 5 + 0 x 4 + 0 x 3 + 2 x 2 +3 x + 5 is 8 + 6 + 5 = 19 when x is 2 (12 – 10)2 + (22 – 19)2 = 22 + 32 = 13 If these are the only two data points, the “badness” of [0, 0, 0, 2, 3, 5] is 13

A more realistic example, part III • Your algorithm might be as follows: – Create 100 six-element arrays of random numbers – Repeat 500 times (or any other number): • For each of the 100 arrays, compute its badness (using all data points) • Keep the ten best arrays (discard the other 90) • From each array you keep, generate nine new arrays as follows: 30 – Pick a random element of the six – Pick a random floating-point number between 0. 0 and 2. 0 – Multiply the random element of the array by the random floating-point number – After all 500 trials, pick the best array as your final answer

The really simple example again • Suppose your “organisms” are 32 -bit computer words, and you want a string in which all the bits are ones • Here’s how you can do it: – Create 100 randomly generated computer words – Repeatedly do the following: • Count the 1 bits in each word • Exit if any of the words have all 32 bits set to 1 • Keep the ten words that have the most 1 s (discard the rest) • From each word, generate 9 new words as follows: – Choose one of the other words – Take the first half of this word and combine it 31 with the second half of the other word

Asexual vs. sexual reproduction • In the examples so far, – Each “organism” (or “solution”) had only one parent – Reproduction was asexual – The only way to introduce variation was through mutation (random changes) • In sexual reproduction, – Each “organism” (or “solution”) has two parents – Assuming that each organism has just one chromosome, new offspring are produced by forming a new chromosome from parts of the chromosomes of each parent 32

Crossover and mutation operators Mutation Crossover

Crossover • Crossover is a genetic operator that combines (mates) two chromosomes (parents) to produce a new chromosome (offspring). • Types of crossover – One point – Two point – Arithmetic – Heuristic

Types of crossover • One point crossover 0110 1001 0100 1110 1010 1101 1011 0101 1101 0100 0101 1010 1011 0100 1010 0101 0110 1001 0100 1110 1011 0100 1010 0101 Two point crossover 0110 1001 0100 1110 1010 1101 1011 0101 1101 0100 0101 1010 1011 0100 1010 0101 0110 1001 0101 1010 1011 0101

Arithmetic crossover • Offspring 1 = a * Parent 1 + (1 - a) * Parent 2 Offspring 2 = (1 – a) * Parent 1 + a * Parent 2 • Parent 1: (0. 3)(1. 4)(0. 2)(7. 4) Parent 2: (0. 5)(4. 5)(0. 1)(5. 6) a=0. 7 • Offspring 1: (0. 36)(2. 33)(0. 17)(6. 86) Offspring 2: (0. 402)(2. 981)(0. 149)(6. 842)

Comparison of simple examples • In the simple example (trying to get all 1 s): – The sexual (two-parent, no mutation) approach, if it succeeds, is likely to succeed much faster • Because up to half of the bits change each time, not just one bit – However, with no mutation, it may not succeed at all • By pure bad luck, maybe none of the first (randomly generated) words have (say) bit 17 set to 1 – Then there is no way a 1 could ever occur in this position • Another problem is lack of genetic diversity – Maybe some of the first generation did have bit 17 set to 1, but none of them were selected for the second generation • The best technique in general turns out to be sexual reproduction with a small probability of mutation 37