# Evolutionary Computation 2009 Evolutionary Iterated Prisoners Dilemma Game

- Slides: 53

Evolutionary Computation, 2009 Evolutionary Iterated Prisoner’s Dilemma Game H. -T. Kim

Outline • Evolutionary Prisoner's Dilemma Game – Iterated Prisoner's Dilemma Game – N-person Iterated Prisoner's Dilemma Game – Robert Axelrod’s n. IPD game • Evolution of Iterated Prisoner's Dilemma Game Strategies in Structured Demes Under Random Pairing in Game Playing • Simulation on Worksite Interactions between Laborers and Firms by using Multi-Agent based Evolutionary Computation 1

Robert Axelrod’s n. IPD game – Step 2 • Encoding - 이전 3번의 게임을 기억해야 하는 경우 CC CC CC (case 1) CC CC CD (case 2) CC CC DC (case 3) 64가지 경우 … DD DD DC (case 63) DD DD DD (case 64) 따라서 총 64 bit + 6 bit 로 전략 encoding 가능 • 64 bit : 각 경우와 행동을 1대1 맵핑 • 6 bit : 이전 3번의 행동을 기억 • EX) CCDCDDDC … DC CCDDCD – 가능한 전략의 수 = 270 8

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 6, DECEMBER 2005 Evolution of Iterated Prisoner's Dilemma Game Strategies in Structured Demes Under Random Pairing in Game Playing Hisao Ishibuchi, Member, IEEE, and Naoki Namikawa, Student Member, IEEE 김희택

Outline • Introduction • Two neighbor structure – IPD game structure – Mating strategy • Simulation – Standard Pairing Scheme • Random Pairing Scheme • Simulation – Random Pairing Scheme • Conclusion 11

Introduction • Spatial IPD game – Framework of structured demes – Cells of two-dimensional grid-world • Two neighborhood structure ① Interaction among players through the IPD game ② Interaction among players for mating strategies ØSimilar to world of territorial animals or plant • Random pairing scheme – Plays game with a randomly chosen neighbor at every round – Demonstrate evolution of cooperation behavior (in random pairing) 12

Basic structure – Payoff Matrix • Payoff Matrix of the game 13

Basic structure – Strategy Encoding • Single player has a single strategy • Every Strategy is represented by 5 bit binary sequence – Example of strategy (TIT-FOR-TAT) 14

IPD game structure – World and Neighborhood • Use 31 * 31 grid-world – All player locate on one cell – 961 player exist • Examples of neighborhood structure 15

IPD game structure - Game play and Fitness • NIPD(i) – The set of Player i and its neighbors • Game play – The game is iterated for a pre-specified number of rounds (e. g, 100 rounds) – Each player plays game against only its neighbors • Randomly select opponents • Fitness – Average payoff obtained from each round of the game 16

Mating strategy – formulation • NGA(i) – Set of player i and its neighbors v NIPD(i) = NGA(i) is not always hold • Parents is selected from NGA(i) – Using roulette wheel selection • Selection probability of strategy j – f(si) : fitness of player i with strategy si – Fmin(NGA(i)) : minimum fitness among the NGA(i) 17

Mating strategy – crossover and mutation • One point crossover • Bitmap mutation 18

Simulation • Two kinds of simulation ① Simulate two neighborhood structure with standard pairing scheme • Verify the effect of two neighborhood structure on evolution of cooperative behavior ② Simulate two neighborhood structure with random pairing scheme • Examine the effect random pairing scheme on evolution of cooperative behavior • 961 spatially fixed player (31 * 31 grid-world) • Mistake (noisy IPD model) – A player chooses an action different from its strategy 19

Standard Pairing Simulation – Parameter Setting • Case of two neighborhood structure • Parameter value Mistake probability 0, 0. 001, 0. 1 Crossover probability 1. 0 Mutation probability 1 / (5*961) Termination of IPD game 100 rounds Termination of evolution 1000 generations 20

Standard Pairing Simulation – Result • NIPD has a significant effect on the evolution of cooperative behavior • NGA has a much smaller effect than NIPD • Small NIPD facilitate the evolution of cooperative behavior <Mistake probability 0. 1> 21

Standard Pairing Simulation – Result (2) • Better results were obtained from smaller mistake probabilities • Cooperative behavior were evolved independently from the two neighborhood structures <Mistake probability 0. 01> <Mistake probability 0. 001> 22

Random Pairing Scheme • Every player chooses its opponent randomly from NIPD at every round of the game • The memory about the interaction with a neighbor may influence an player’s future action against another neighbor 23

Random Pairing Simulation – Result (1) • The same parameter specifications were used as in the previous • Evolution of cooperative behavior is very difficult to achieve <Mistake probability 0> • Increase number of opponents Decreased the probability to 24

Random Pairing Simulation – Result (2) Parameter Value Mistake probability 0 NIPD(i) 3 NGA(i) 5 • Strategy characterized by the genetic form “ 1***1” 25

Random Pairing Simulation – Result (3) Parameter Value Mistake probability 0 NIPD(i) 5 NGA(i) 5 • Strategy characterized by the genetic form “****0” – The existence of strategies of this type prevents the consecutive occurrence of mutual cooperation 26

Random Pairing Simulation – Result (4) Parameter Value Mistake probability 0. 01 NIPD(i) 3 NGA(i) 5 • Strategy characterized by the genetic form “ 11**1” – Those strategies have the ability to recover from mutual defection (D, D) – This ability seems to be important under a noisy situation 27

Random Pairing Simulation – Result (5) Parameter Value Mistake probability 0. 01 NIPD(i) 5 NGA(i) 5 • The TFT strategy “ 10011” increased its percentage to almost 100% • Higher average payoff was obtained from strategies of the form “ 11**1, ” rather than the TFT strategy “ 10011. ” 28

Other Simulations 29

Conclusion • Formulated a spatial IPD game using the concept of two neighborhood structures ① Interaction among players through the IPD game ② Mating strategies – Computer Simulation • Use of a small interaction neighborhood facilitated the evolution of cooperative behavior • Introduced a random pairing scheme with the two neighborhood structures – Computer Simulation • Cooperative behavior was evolved when we smallest interaction neighborhood is used • Future Work – Explain the results of random pairing scheme simulation – Use a stochastic strategy represented by a string of real numbers between 0 and 1 30

Social Simulation Workshop at the International Joint Conference on Artificial Intelligence Simulation on Worksite Interactions between Laborers and Firms by using Multi-Agent based Evolutionary Computation Soft Computing Laboratory, Yonsei University Hee-Taek Kim and Sung-Bae Cho [email protected] yonsei. ac. kr , [email protected] yonsei. ac. kr

Motivation Low wage, but high productivity <Firm> High wage. . . Wage <Laborer> Labor • Laborers and firms formulate strategic relationship – What is rational strategy in position of laborer or firm • Can we drive mutual benefits relation between Laborers and firm? • General economic belief • laborer tends to cooperate with cooperative firms • Firm tends to cooperate with cooperative laborers 32

Introduction of the Simulation Model • Construct computational work-site interaction model – Multi-agent based approach – Consist of worker agent and firm agent – Implement adaptive agent by using evolutionary computation • Simulate interaction between workers and firms – Workers and firms are mutually interact each other – Make collaborative or competitive relationship 33

Evolutionary Computation • Based on Darwinism – “Survivals of the fittest” – Apply evolutionism to computation • Widely used to modeling social phenomena – Individual population, behavioral rule, selection and reproduction – Each individual can adapt to dynamic environment • Basic evolution process Population Calculate Fitness Selection Reproduction (Crossover and mutation) 34

Simulation Process – Laborer’s Phase • The interaction protocol between workers and firms can be divided into two phase – Laborer’s phase and firm’s phase <Laborers Phase> • Laborers have to decide whether to resign from firm or not • Laborers have to decide whether to cooperate or defect with his employer 35

Simulation Process – Firm’s Phase • Firm’s phase • Firms have to decide whether to cooperate or defect with his opponent laborers 36

Overall Process of Simulation 37

Simulation framework 38

Internal Attributes – Laborer Attributes of laborer Description int ID Unique identifier of this laborer int employed. Firm. ID Unique identifier of a firm who employed this laborer double asset Total asset of this laborer double productivity The productivity offered to firm double living. Cost Living expenses per one generation. Subtract from asset int state Current state { WORKING, JOBLESS, FRESH, FAILED } int continues The counts of generations from employment to now Array chromosome Array of integers representing strategy of this laborer Array firm. Career After resignation, laborer never employed to same firm again Queue firm. Past. Behaviors The cooperation history of the firm employed this laborer Queue laborer. Past. Behaviors The cooperation history of this laborer 39

Internal Attributes– Firm Attributes of firm Description int ID Unique identifier of the firm double capital Total capital of this firm. Correspond to laborer’s asset double supporting. Cost The cost for maintenance of a firm Array chromosome Array of integers representing strategy of this firm Array my. Laborers Array of laborers who are employed in this firm 40

Action of Agent • Cooperation and defection • Laborer – Cooperation : High Productivity (Prod. H) – Defection : Low Productivity (Prod. L) – Resign : resign from opponent firm • Firm – Cooperation : High wage (Wage. H) – Defection : Low wage (Wage. L) Laborer (Laborer, Firm) Firm cooperation defection Cooperation (Wage. H, Prod. H - Wage. H) (Wage. L, Prod. H – Wage. L) Defection (Wage. H, Prod. L - Wage. H) (Wage. L, Prod. L – Wage. L) 41

Behavioral Strategy of Agent • Behavioral strategy determine current action of the agent – All individuals has its own strategy – All strategies evolve as the simulation being progressed 42

Evolutionary Engine • Fitness evaluation – Firm • The capital attribute is treated as fitness of the firm – Laborer • The asset attribute is treated as fitness of the laborer • Selection – Used roulette wheel selection – Possibility of selection 43

Evolutionary Engine • Crossover and mutation – One point cross over – One point bit mutation • Elimination – Eliminate incapable agents from simulation 44

Experimental Design Description Value Cooperation (Wage. H, Prod. H - Wage. H) Description Defection (Wage. H, Prod. L - Wage. H) Initial population of firm Maximum population of firm Initial population of laborers Maximum population of laborers Increment rate of laborers population (Worker, Firm) (Reproduce rate) Mutation rate Cooperation Selection method Crossover method Worker Labor er Othe Laborer Firm Initial capital Initial number of laborers per one firm Maximum number of laborers per one firm supporting. Cost Wage. H Wage. L Initial asset living. Cost Prod. H Prod. L Initial number of firms Maximum capacity of history queue ( ) (Laborer, Firm) cooperation Mistake probability Defection 2000 10 30 30 12 Wage. H/2 200 10 18 Prod. H/2 30 Firm 10 0. 01 defection (Wage. L, Prod. H – Wage. L) Value (Wage. L, Prod. L – Wage. L) 30 Infinite 330 Infinite Firm cooperation 0. 005 Defection (12, 6) (12, -3) 0. 005 (6, 12) Roulette wheel 1 point crossover (6, 3) 45

Experimental Result 46

Experimental Result (2) 47

Conclusion – Second Experiment • Forbid resignation of laborers – Laborers cannot escape from vicious firm – Firms just want to extort faithful laborer • Results in breakdown of all agents because of selfish behavior of the firms 48

Current Works • Extend 2*2 interaction model Continuous model based on linear algebra – Asset/living. Cost X 1 + Recent. Given. Pay X 2 + Continuous X 3 … • Beside previous activity of opponent agent, many other factors can affect current action of the agent – Environmental information, my current state, opponent state, and so on… • Test various policies to simulation model and analysis it’s effect 49