Data Mining Waqas Haider Bangyal 1 Evolutionary computing
Data Mining Waqas Haider Bangyal 1
Evolutionary computing algorithms are very common and used by many researchers in their research to solve the optimization problems.
Steps in Any Evolutionary Algorithm �Tentative Solution Representation �Random Solution Initialization �Fitness Estimation �Operators �Termination Condition
Solution Representation • The representation of an individual is the method to construct and evaluate the solution for a desired problem. • This can also be termed as the data structure used to define an individual. 4
Solution Initialization �Initialization plays an important role in success of an evolutionary algorithm. � A poor initial population cause any good algorithm to prematurely converge to a suboptimal solution. �On the other hand a good initialization leads to genetic diversity in the initial population making most of the algorithms work sufficiently well 5
Solution Fitness �Fitness is the performance of an individual corresponding to the problem it is aimed to solve. � It is compliance of the structure to the task it is required to solve based on a user specified criteria. � It tells which elements or the regions of the search space are good. 6
Solution Fitness (Contd…) �The fitness measure steers the evolutionary process towards better approximate solutions to the problem. � Fitness of individuals in a population can be measured in many ways. � For example, it can be a measure of error between the original and desired output of a solution or accuracy in case of classification 7
Operators �The evolutionary Computing algorithm have the different operators that are applied on individuals selected for that operation �The most common operators used for evolution of GA are crossover, mutation and reproduction �The most common operators used in PSO are position and velocity 8
Termination Condition �The termination condition determines when this iterative process needs to be stopped. �A pre-determined number of generations or time has elapsed �A satisfactory solution has been achieved �No improvement in solution quality has taken place for a pre-determined number of generations 9
Termination Condition 1 -Time: in seconds, in minutes and may be in hours according to the problem that you have it. 2 -Number of generations: in hundreds, in thousands may be in millions according to the problem you have it. 3 -convergence: when 95% of populations have the same fitness value we can say the convergence started to appear and the user can stop its genetic program to take the result. 10
Genetic Algorithm
Classes of Search Techniques Search Techniqes Calculus Base Techniqes Fibonacci Enumerative Techniques Guided random search techniqes Sort Tabu Search DFS Hill Climbing Simulated Anealing Genetic Programming Dynamic Programming Evolutionary Algorithms Genetic Algorithms BFS
Genetic Algorithm • Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. • GA has been successfully applied to problems that are difficult to solve using conventional techniques such as scheduling problems, traveling salesperson problem, network routing problems and financial marketing.
Genetic Algorithms �An algorithm is a set of instructions that is repeated to solve a problem. �A genetic algorithm conceptually follows steps inspired by the biological processes of evolution. �Genetic Algorithms follow the idea of SURVIVAL OF THE FITTEST- Better and better solutions evolve from previous generations until a near optimal solution is obtained.
Genetic Algorithms �A genetic algorithm is an iterative procedure that represents its candidate solutions as strings of genes called chromosomes. �Genetic Algorithms are often used to improve the performance of other AI methods such as expert systems or neural networks. �The method learns by producing offspring that are better and better as measured by a fitness function, which is a measure of the objective to be obtained (maximum or minimum).
What is GA �Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. �Traditionally, solutions are represented in binary as strings of 0 s and 1 s, but other encodings are also possible.
Simple GA { initialize population; evaluate population; while Termination Criteria Not Satisfied { select parents for reproduction; perform crossover and mutation; repair(); evaluate population; } } Every loop called generation
Algorithm BEGIN Generate initial population; Compute fitness of each individual; REPEAT /* New generation /* FOR population_size / 2 DO Select two parents from old generation; /* biased to the fitter ones */ Recombine parents for two offspring; Compute fitness of offspring; Insert offspring in new generation END FOR UNTIL population has converged END
Genetic learning algorithm �Step 1: �Step 2: Initialize a population P of n elements as a potential solution. Until a specified termination condition is satisfied: � 2 a: Use a fitness function to evaluate each element of the current solution. If an element passes the fitness criteria, it remains in P. � 2 b: The population now contains m elements (m <= n). Use genetic operators to create (n – m) new elements. Add the new elements to the population.
Conceptual Algorithm
Digitalized Genetic knowledge representation �A common technique for representing genetic knowledge is to transform elements into binary strings. �For example, we can represent income range as a string of two bits for assigning “ 00” to 20 -30 k, “ 01” to 30 -40 k, and “ 11” to 50 -60 k.
Genetic operator - Crossover �The elements most often used for crossover are those destined to be eliminated from the population. �Crossover forms new elements for the population by combining parts of two elements currently in the population.
Genetic operator - Mutation �Mutation is sparingly applied to elements chosen for elimination. �Mutation can be applied by randomly flipping bits (or attribute values) within a single element.
Genetic operator - Selection �Selection is to replace to-be-deleted elements by copies of elements that pass the fitness test with high scores. �With selection, the overall fitness of the population is guaranteed to increase.
Key terms �Individual - Any possible solution �Population - Group of all individuals �Search Space - All possible solutions to the problem �Chromosome - Blueprint for an individual �Locus - The position of a gene on the chromosome �Genome - Collection of all chromosomes for an individual
Chromosome, Genes and Genomes
Genetic Algorithm Introduction �Inspired by natural evolution �Population of individuals �Individual is feasible solution to problem �Each individual is characterized by a Fitness function �Higher fitness is better solution �Based on their fitness, parents are selected to reproduce offspring for a new generation �Fitter individuals have more chance to reproduce �New generation has same size as old generation; old generation dies �Offspring has combination of properties of two parents �If well designed, population will converge to optimal solution
Components of a GA A problem definition as input, and �Encoding principles (gene, chromosome) �Initialization procedure (creation) �Selection of parents (reproduction) �Genetic operators (mutation, recombination) �Evaluation function (environment) �Termination condition
1) Representation (encoding) Possible individual’s encoding �Bit strings (0101. . . 1100) �Real numbers (43. 2 -33. 1. . . 0. 0 89. 2) �Permutations of element (E 11 E 3 E 7. . . E 15) �Lists of rules (R 1 R 2 R 3. . . R 22 R 23) �Program elements (genetic programming) �. . . any data structure. . .
2) Initialization Start with a population of randomly generated individuals, or use A previously saved population or - A set of solutions provided by a human expert or - A set of solutions provided by another heuristic algorithm
3 ) Selection �Purpose: to focus the search in promising regions of the space �Inspiration: Darwin’s “survival of the fittest”
4) Reproduction �Reproduction operators �Crossover �Mutation
4) Reproduction �Crossover �Two parents produce two offspring �Generally the chance of crossover is between 0. 6 and 1. 0 �Mutation �There is a chance that a gene of a child is changed randomly �Generally the chance of mutation is low (e. g. 0. 001)
4) Reproduction Operators � 1) Crossover �Generating offspring from two selected parents � Single point crossover � Two point crossover (Multi point crossover)
One-point crossover 1 �Randomly one position in the chromosomes is chosen �Child 1 is head of chromosome of parent 1 with tail of chromosome of parent 2 �Child 2 is head of 2 with tail of 1 Randomly chosen position
Two-point crossover �Randomly two positions in the chromosomes are chosen �Avoids that genes at the head and genes at the tail of a chromosome are always split when recombined
4) Reproduction Operators � 2) Mutation
5) Evaluation (fitness function) �Solution is only as good as the evaluation function; choosing a good one is often the hardest part �Similar-encoded solutions should have a similar fitness
6) Termination condition Examples: �A pre-determined number of generations or time has elapsed �A satisfactory solution has been achieved �No improvement in solution quality has taken place for a pre-determined number of generations
Benefits of GAs �Concept is easy to understand �Supports multi-objective optimization �Good for “noisy” environments �Always an answer; answer gets better with time �Inherently parallel; easily distributed
Example: the MAXONE problem Suppose we want to maximize the number of ones in a string of l binary digits It may seem so because we know the answer in advance However, we can think of it as maximizing the number of correct answers, each encoded by 1, to l yes/no difficult questions`
Example (cont) �An individual(Chromosome) is encoded (naturally) as a string of length l binary digits �The fitness function f of a candidate solution(chromosome) to the MAXONE problem is the number of ones in its genetic code �We start with a population of n random strings. Suppose that length of each chromosome = 10 and Total solutions are n = 6
Example (initialization) We toss a fair coin 60 times and get the following initial population: s 1 = 1111010101 f (s 1) = 7 s 2 = 0111000101 f (s 2) = 5 s 3 = 1110110101 f (s 3) = 7 s 4 = 010011 f (s 4) = 4 s 5 = 11101 f (s 5) = 8 s 6 = 0100110000 f (s 6) = 3 In first solution with name S 1 , first four times head comes so we assign 1111 and then tail we assign 0, same ten times we make a chromosome of bit of strings
Example (selection 1) Next we apply fitness proportionate selection with the roulette wheel method: Individual i will have a probability to be chosen We repeat the extraction as many times as the number of individuals we need to have the same parent population size (6 in our case) n 2 1 3 4 Area is Proportional to fitness value
Example (selection 2) Suppose that, after performing selection, we get the following population: s 1` = 1111010101 (s 1) s 2` = 1110110101 (s 3) s 3` = 11101 (s 5) s 4` = 0111000101 (s 2) s 5` = 010011 (s 4) s 6` = 11101 (s 5)
Example (crossover 1) Next we mate strings for crossover. Suppose that we decide to actually perform crossover only for couples (s 1`, s 2`) and (s 5`, s 6`). For each couple, we randomly extract a crossover point, for instance 2 for the first and 5 for the second
Example (crossover 2) Before crossover: s 1` = 1111010101 s 5` = 010011 s 2` = 1110110101 s 6` = 11101 After crossover: s 1`` = 1110110101 s 5`` = 0100011101 s 2`` = 1111010101 s 6`` = 1110110011
Example (mutation 1) The final step is to apply random mutation: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0. 1) Before applying mutation: s 1`` = 1110110101 s 2`` = 1111010101 s 3`` = 11101 s 4`` = 0111000101 s 5`` = 0100011101 s 6`` = 1110110011
Example (mutation 2) After applying mutation: s 1``` = 1110100101 f (s 1``` ) = 6 s 2``` = 1111110100 f (s 2``` ) = 7 s 3``` = 1110101111 f (s 3``` ) = 8 s 4``` = 0111000101 f (s 4``` ) = 5 s 5``` = 0100011101 f (s 5``` ) = 5 s 6``` = 1110110001 f (s 6``` ) = 6 Total number of 1’s after mutations are 37
Example (end) In one generation, the total population fitness changed from 34 to 37, thus improved by ~9% At this point, we go through the same process all over again, until a stopping criterion is met
- Slides: 50