ICT 619 Intelligent Systems Topic 5 Genetic Algorithms

  • Slides: 37
Download presentation
ICT 619 Intelligent Systems Topic 5: Genetic Algorithms ICT 619 S 2 -05

ICT 619 Intelligent Systems Topic 5: Genetic Algorithms ICT 619 S 2 -05

Genetic Algorithms § § § Introduction How GAs work The TSP as an example

Genetic Algorithms § § § Introduction How GAs work The TSP as an example Business Applications of GA Advantages of GA systems Some issues related to GA based systems § Case Study ICT 619 S 2 -05 2

What is a genetic algorithm? § GA part of the broader soft computing paradigm

What is a genetic algorithm? § GA part of the broader soft computing paradigm known as evolutionary computation § First introduced by Holland (1975) § Inspired by possibility of problem solving through a process of evolution ICT 619 S 2 -05 3

What is a GA? (cont’d) § GA mimics biological evolution to generate better solutions

What is a GA? (cont’d) § GA mimics biological evolution to generate better solutions from existing solutions through § § § survival of the fittest crossbreeding and mutation ICT 619 S 2 -05 4

What is a GA? (cont’d) § GA capable of finding solutions for many problems

What is a GA? (cont’d) § GA capable of finding solutions for many problems for which no usable algorithmic solutions exist § GA methodology particularly suited for optimization § Optimization searches a solution space consisting of a large number of possible solutions § GA reduces the search space through evolution ICT 619 S 2 -05 5

How GAs work § A population of candidate solutions are repeatedly altered until an

How GAs work § A population of candidate solutions are repeatedly altered until an optimal solution is found § The GA evolutionary cycle 1. Starts with a randomly generated initial population of solutions (1 st generation) 2. Selects a population of better solutions (next generation) by using a measure of goodness (fitness evaluation function) 3. Alters new generation population through crossbreeding and mutation Processes of selection (step 2) and alteration (step 3) lead to a population with a higher proportion of better solutions ICT 619 S 2 -05 6

How GAs work (cont’d) § The GA evolutionary cycle continues until an acceptable solution

How GAs work (cont’d) § The GA evolutionary cycle continues until an acceptable solution is found in the current population, or § some control parameter such as the maximum number of generations is exceeded ICT 619 S 2 -05 7

How solutions are represented § A series of genes, known as a chromosome, represents

How solutions are represented § A series of genes, known as a chromosome, represents one possible solution § Each gene in the chromosome represents one component of the solution pattern. § Each gene can have one of a number of possible values known as alleles § The process of converting a solution from its original form into a chromosome is known as coding. ICT 619 S 2 -05 8

How solutions are represented (cont’d) § The most common form of representing a solution

How solutions are represented (cont’d) § The most common form of representing a solution as a chromosome is a string of binary digits (aka a binary vector) Eg, 10101001 § Each bit in this string is a gene with two alleles 0 and 1 § Other forms of representation are also used, eg, integer vectors § Solution bit strings are decoded to enable their evaluation using a fitness measure. ICT 619 S 2 -05 9

GA Selection § Selection in GA based on a process similar to that found

GA Selection § Selection in GA based on a process similar to that found in biological evolution § Only the fittest survive and contribute to the gene pool of the next generation Fitness proportional selection § Each chromosome’s likelihood of being selected is proportional to its fitness value. § Solutions failing selection are “bad”, and are discarded ICT 619 S 2 -05 10

Alteration = Crossover + Mutation § Alteration refines good solutions from current generation to

Alteration = Crossover + Mutation § Alteration refines good solutions from current generation to produce next generation of solutions § Carried out by performing crossover and mutation. Crossover § Done by splicing two chromosomes at a crossover point and swapping the spliced parts § A better chromosome may be created by combining genes with good characteristics from one chromosome with some good genes in the other chromosome § Crossover carried out with a probability – typically 0. 7 § Chromosomes not crossed over are cloned ICT 619 S 2 -05 11

Crossover and Mutation § A random adjustment in the genetic composition § Can be

Crossover and Mutation § A random adjustment in the genetic composition § Can be useful for introducing new characteristics in a population § May be counterproductive § Probability kept low: typically 0. 001 to 0. 01 ICT 619 S 2 -05 12

The typical Genetic Algorithm 1. Represent the solution as a chromosome of fixed length,

The typical Genetic Algorithm 1. Represent the solution as a chromosome of fixed length, choose size of population N, crossover probability pc and mutation probability pm. 2. Define a fitness function f for measuring fitness of chromosomes. 3. Create an initial solution population randomly of size N: x 1, x 2, …, x. N 4. Use the fitness function f to evaluate the fitness value of each solution in the current generation: f(x 1), f(x 2), …, f(x. N) ICT 619 S 2 -05 13

The typical Genetic Algorithm (cont’d) 5. Select “good” solutions based on fitness value. Discard

The typical Genetic Algorithm (cont’d) 5. Select “good” solutions based on fitness value. Discard rest of the solutions. 6. If acceptable solution(s) found in the current generation or maximum number of generations is exceeded then stop. 7. Alter the solution population using crossover and mutation to create a new generation of solutions with population size N. 8. Go to step 4. ICT 619 S 2 -05 14

The TSP example The travelling salesperson problem (TSP) § Given a set of n

The TSP example The travelling salesperson problem (TSP) § Given a set of n cities (A, B, C, . . . ) find a closed tour of all cities with a short total distance d. § Tour cost may be something other than distance § An optimization problem with following constraints § 1. Each city to be visited once and only once § 2. Total distance travelled must be shortest possible § Time required to find a solution by exhaustive search increases exponentially § Possible number of tours for n cities = n!/2 n § 1 million centuries for 50 cities at the rate of 1 billion tours per sec. ICT 619 S 2 -05 15

The TSP example (cont’d) Representation and coding of TSP solutions § A solution to

The TSP example (cont’d) Representation and coding of TSP solutions § A solution to the TSP problem is an ordered list of the n cities § Each city is assigned 1 out of n possible positions § Representation of the solution may be visualised with the help of a matrix § Each row represents a city § Each column associated with a tour position for cities ICT 619 S 2 -05 16

The TSP example (cont’d) Representation and coding of TSP solutions § A solution to

The TSP example (cont’d) Representation and coding of TSP solutions § A solution to the TSP problem is an ordered list of the n cities § Each city is assigned 1 out of n possible positions § Representation of the solution may be visualised with the help of a matrix § Each row represents a city § Each column associated with a tour position for cities ICT 619 S 2 -05 17

The TSP example (cont’d) § The tour represented above is CAEBDC § One possible

The TSP example (cont’d) § The tour represented above is CAEBDC § One possible bit string code for this solution: 01000 00010 1000 00001 00100 (rows written end to end) § Binary bit strings can produce "faulty" chromosomes needing repair § An integer vector scheme produced a 100 city tour 9. 4% above optimal cost ICT 619 S 2 -05 18

Business Applications of GA § Increasing number of industrial and business applications of GA

Business Applications of GA § Increasing number of industrial and business applications of GA since late 1980 s § In business, applications include (Kingdon 1997) § § § Portfolio optimisation Bankruptcy prediction Financial forecasting Fraud detection Scheduling ICT 619 S 2 -05 19

Business Applications of GA (cont’d) PAPAGEN project in Europe § Demonstrated potential of GA

Business Applications of GA (cont’d) PAPAGEN project in Europe § Demonstrated potential of GA technology in a broad range of business applications including § § Insurance risk assessment Economic modelling Credit scoring Direct marketing First Quadrant - investment firm in California § Started using GA technique in 1993 § Uses GA to manage US$5 billion worth of investments in 17 different countries ICT 619 S 2 -05 20

Advantages of GA systems § Useful when no algorithms or heuristics are available for

Advantages of GA systems § Useful when no algorithms or heuristics are available for solving a problem § No formulation of the solution is required - only "recognition" of a good solution § A GA system can be built as long as a solution representation and an evaluation scheme can be worked out § So minimal domain expert access is required ICT 619 S 2 -05 21

Advantages of GA systems § GA can act as an alternative to § Expert

Advantages of GA systems § GA can act as an alternative to § Expert Systems if § number of rules is too large or § the nature of the knowledge-base too dynamic § Traditional optimization techniques if § constraints and objective functions are non-linear and/or discontinuous ICT 619 S 2 -05 22

Advantages of GA systems (cont'd) § GA does not guarantee optimal solutions, but produce

Advantages of GA systems (cont'd) § GA does not guarantee optimal solutions, but produce near optimal solutions which are likely to be very good § Solution time with GA is highly predictable Determined by § § § Size of the population Time taken to decode and evaluate a solution and Number of generations of population § GA uses simple operations to solve problems that are computationally prohibitive otherwise Example: the TSP problem ICT 619 S 2 -05 23

Advantages of GA systems (cont'd) § Because of simplicity, GA software § Reasonably sized

Advantages of GA systems (cont'd) § Because of simplicity, GA software § Reasonably sized and self-contained § Easier to embed them as a module in another system § GA can also aid in developing intelligent business systems that use other methodologies, eg, § Building the rule base of an expert system § Finding optimal neural networks ICT 619 S 2 -05 24

Some issues related to GA based systems Level of explainability § Capability to explain

Some issues related to GA based systems Level of explainability § Capability to explain why a particular solution was arrived at is practically nil § Does not know what a fitness value really means Scalability § Moderately scalable Accommodates increased number of variables by increasing the length of the chromosome But § A longer chromosome means a larger population space (more potential combinations of genes) § More time required for decoding and fitness evaluation. ICT 619 S 2 -05 25

Some issues related to GA based systems (cont’d) Data requirements § In general, GA

Some issues related to GA based systems (cont’d) Data requirements § In general, GA do not require extensive access to data but some applications may need it to evaluate solutions § This makes the quality and quantity of data is important Local maxima § Local maxima are regions that hold good solutions relative to regions around them, but which do not necessarily contain the best overall solutions § The region(s) that contain the best solutions are called global maxima § GAs are less prone to being trapped in local maxima because of the use of mutation and crossover ICT 619 S 2 -05 26

Some issues related to GA based systems (cont’d) Premature convergence § A GA is

Some issues related to GA based systems (cont’d) Premature convergence § A GA is said to have converged prematurely if it explores a local maximum extensively § It may be then dominated by very similar solutions within the region § Most significant factor leading to such convergence is a mutation rate which is too slow § Mutation interference is an effect opposite to that of premature convergence ICT 619 S 2 -05 27

Some issues related to GA based systems (cont’d) § Mutation interference § Finding a

Some issues related to GA based systems (cont’d) § Mutation interference § Finding a mutation rate which allows the GA to converge but which also allows adequate exploration of the solution space is essential for satisfactory performance § Mutation interference occurs when mutation rates in a GA are too high, and as a result: § Solutions are frequently or drastically mutated § The algorithm never manages to explore any region of the space thoroughly § Any good solutions found tend to be destroyed rapidly ICT 619 S 2 -05 28

Case Study - Help Desk Task Scheduling (Dhar & Stein 1997, pp. 219 -227)

Case Study - Help Desk Task Scheduling (Dhar & Stein 1997, pp. 219 -227) GA based system developed at Moody’s for scheduling service tasks to its customer service representatives (CSR). Major constraints § The system § Must minimise computer downtime and customer dissatisfaction § Must integrate with existing database system which kept track help desk. ICT 619 requests. S 2 -05 29

Case study – constraints (cont’d) § Must be flexible to § Accommodate new types

Case study – constraints (cont’d) § Must be flexible to § Accommodate new types of task definitions and changes in employee, training etc. § Allow administrator to modify solutions. § Must generate and reevaluate schedules quickly (under 15 minutes) and consistently. § Must not take administrator or CSRs away from their jobs for any extended period of time. § Must be developed quickly. ICT 619 S 2 -05 30

Case study – constraints (cont’d) § Should be scalable in case of future growth

Case study – constraints (cont’d) § Should be scalable in case of future growth in number of requests for help and the number of CSRs. § Must not be too complicated for its users – the administrator and CSRs. § The main difficulties in meeting the constraints: § § the large number of tasks the large number of CSRs the varying capabilities of CSRs, and the wide variety of tasks ICT 619 S 2 -05 31

Case Study - Issues needing to be considered § The priority of a task,

Case Study - Issues needing to be considered § The priority of a task, which is determined by the severity of the problem. § The length of time required to perform the task and how it would affect the servicing of other users § The ability of various CSRs to perform different levels of tasks (expertise must match the complexity) § Low priority tasks must not be kept waiting indefinitely. § The measure of goodness of a schedule to be based on amount of downtime each schedule cost the organisation. ICT 619 S 2 -05 32

Possible solution methodologies considered 1. Traditional linear programming (a numerical optimisation technique) 2. A

Possible solution methodologies considered 1. Traditional linear programming (a numerical optimisation technique) 2. A rule based expert system 3. A GA based system § ES ruled out because § Expertise to solve this problem not expressible as a set of rules § Help desk administrator not available for knowledge extraction § Linear programming ruled out because § It fails if no optimal solution can be found § It does not produce any sub-optimal solutions, which is the case with GA. ICT 619 S 2 -05 33

Case Study - the solution SOGA (Schedule Optimising for GA) § A hybrid system

Case Study - the solution SOGA (Schedule Optimising for GA) § A hybrid system consisting of GA and fuzzy system components § The GA component deals with the scheduling task. § Each task in the queue is represented by a gene § The entire task list forms the chromosome § Each chromosome is decoded by a scheduling module that assigns tasks to available CSRs who can perform them ICT 619 S 2 -05 34

Case Study - the solution (cont’d) § Fitness of each chromosome is determined by

Case Study - the solution (cont’d) § Fitness of each chromosome is determined by calculating the amount of downtime that would result based on the schedule represented by the chromosome. § Schedules generated by the GA component are modified by the FS component § SOGA runs in the background behind the help request tracking system § Updates schedules based upon a predefined time interval (eg, every 10 or 15 minutes) § CSRs access their current job queue through their interface to accept jobs. ICT 619 S 2 -05 35

Case Study - Results § The system is timely – generating schedules in about

Case Study - Results § The system is timely – generating schedules in about 5 minutes. § The solutions are found to be good by the help desk administrator § The system is flexible enough to allow for task definitions § The system scales up well to larger domains (higher number of tasks) § The SOGA system was developed in two months using one programmer overseen by its designers ICT 619 S 2 -05 36

REFERENCES § Dhar, V. , & Stein, R. , Seven Methods for Transforming Corporate

REFERENCES § Dhar, V. , & Stein, R. , Seven Methods for Transforming Corporate Data into Business Intelligence. , Prentice Hall 1997, pp. 126 -148, 203 -210. § Goldberg, D. E. , Genetic and Evolutionary Algorithms Come of Age, Communications of the ACM, Vol. 37, No. 3, March 1994, pp. 113 -119. § Holland, J. H. , Adaptation in Natural and Artificial Systems, Univ. of Michigan Press, 1975. § Kingdon, J. , Intelligent Systems and Financial Forecasting, Springer Verlag, London 1997. § Medsker, L. , Hybrid Intelligent Systems, Kluwer Academic Press, Boston 1995. § Michalewicz, Z. , Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, Berlin 1996. § Negnevitsky, M. Artificial Intelligence A Guide to Intelligent Systems, Addison-Wesley 2005. ICT 619 S 2 -05 37