Lecture V Game Theory Zhixin Liu Complex Systems

  • Slides: 50
Download presentation
Lecture V: Game Theory Zhixin Liu Complex Systems Research Center, Academy of Mathematics and

Lecture V: Game Theory Zhixin Liu Complex Systems Research Center, Academy of Mathematics and Systems Sciences, CAS

In the last two lectures, we talked about Multi-Agent Systems Analysis l Intervention l

In the last two lectures, we talked about Multi-Agent Systems Analysis l Intervention l

In this lecture, we will talk about l Game theory complex interactions between people

In this lecture, we will talk about l Game theory complex interactions between people

Start With A Game Rock-paper-scissor B rock A paper scissor rock 0, 0 -1,

Start With A Game Rock-paper-scissor B rock A paper scissor rock 0, 0 -1, 1 paper 1, -1 0, 0 scissor -1, 1 1, -1 -1, 1 0, 0 Other games: poker, go, chess, bridge, basketball, football, …

From Games To Game Theory l Some hints from the games Rules l Results

From Games To Game Theory l Some hints from the games Rules l Results (payoff) l Strategies l Interactions between strategies and payoff l l Games are everywhere. Economic systems: oligarchy monopoly, market, trade … l Political systems: voting, presidential election, international relations … l Military systems: war, negotiation, … l l Game theory the study of the strategic interactions among rational agents. Not to beat the l Rationality other players implies that each player tries to maximize his/her payoff

History of Game Theory l l l 1928, John von Neumann proved the minimax

History of Game Theory l l l 1928, John von Neumann proved the minimax theorem 1944, John von Neumann & Oskar Morgenstern, 《 Theory of Games and Economic Behaviors》 1950 s, John Nash, Nash Equilibrium 1970 s, John Maynard Smith, Evolutionarily stable strategy Eight game theorists have won Nobel prizes in economics

Elements of A Game l l l Player: Who is interacting? N={1, 2, …,

Elements of A Game l l l Player: Who is interacting? N={1, 2, …, n} Actions/ Moves: What the players can do? Action set : Payoff: What the players can get from the game

Strategy l Strategy: complete plan of actions Pure strategy is a special Mixed strategy:

Strategy l Strategy: complete plan of actions Pure strategy is a special Mixed strategy: probability distribution over the kind of mixed strategies pure strategies l Payoff: l

An Example: Rock-paper-scissor Players: A and B l Actions/ Moves: {rock, scissor, paper} B

An Example: Rock-paper-scissor Players: A and B l Actions/ Moves: {rock, scissor, paper} B l Payoff: u 1(rock, scissor)=1 u 2(scissor, paper)=-1 l Mixed strategies l A rock paper scissor rock 0, 0 -1, 1 1, -1 paper 1, -1 0, 0 scissor -1, 1 1, -1 s 1=(1/3, 1/3) s 2=(0, 1/2) u 1(s 1, s 2) = 1/3(0· 0+1/2·(-1)+1/2· 1)+ 1/3(0· 1+1/2· 0+1/2·(-1))+1/3(0·(-1)+1/2· 1+1/2· 0) =0 -1, 1 0, 0

Classifications of Games l l l Cooperative and non-cooperative games Cooperative game: players are

Classifications of Games l l l Cooperative and non-cooperative games Cooperative game: players are able to form binding commitments. Non cooperative games: the players make decisions independently Zero sum and non-zero sum games Zero sum game: the total payoff to all players is zero. E. g. , poker, go, … Non-zero sum game: e. g. , prisoner’s dilemma Finite game and infinite game Finite game: the players and the actions are finite. Simultaneous and sequential (dynamic) games Simultaneous game: players move simultaneously, or if they do not move simultaneously, the later players are unaware of the earlier players' actions Sequential game: later players have some knowledge about earlier actions. Every player know the Perfect information and imperfect information games and payoffs of strategies Perfect information game: all players know the moves previously made the other players but not by all other players. E. g. , chess, go, … necessarily the actions. Perfect information ≠ Complete information

We will first focus on games: Simultaneous Complete information Non cooperative Finite What is

We will first focus on games: Simultaneous Complete information Non cooperative Finite What is the solution of the game?

Assumption l Assume that each player l l knows the structure of the game

Assumption l Assume that each player l l knows the structure of the game attempts to maximize his payoff attempt to predict the moves of his opponents. knows that this is the common knowledge between the players

 Dominated Strategy A strategy is dominated if, regardless of what any other players

Dominated Strategy A strategy is dominated if, regardless of what any other players do, the strategy earns a player a smaller payoff than some other strategies. S-i : the strategy set formed by all other players except player i Strategy s' of the player i is called a strictly dominated strategy if there exists a strategy s*, such that

Elimination of Dominated Strategies Example: L M R L R U 4, 3 5,

Elimination of Dominated Strategies Example: L M R L R U 4, 3 5, 1 6, 2 U 4, 3 6, 2 M 2, 1 8, 4 3, 6 M 2, 1 3, 6 D 3, 0 9, 6 2, 8 D 3, 0 2, 8 L U 4, 3 6, 2 (U, L) is the solution of the game. A dominant strategy may not exist! L R U 4, 3

Definition of Nash Equilibrium l Nash Equilibrium (NE): A solution concept of a game

Definition of Nash Equilibrium l Nash Equilibrium (NE): A solution concept of a game Ø Ø Ø (N, S, u) : a game Si: strategy set for player i : set of strategy profiles : payoff function s-i: strategy profile of all players except player i A strategy profile s* is called a Nash equilibrium if where σi is any pure strategy of the player i.

Remarks on Nash Equilibrium l A set of strategies, one for each player, such

Remarks on Nash Equilibrium l A set of strategies, one for each player, such that each player’s strategy is a best response to others’ strategies Best Response: The strategy that maximizes the payoff given others’ strategies. l No player can do better by unilaterally changing his or her strategy l A dominant strategy is a NE l

Example l l l Players: Smith and Louis Actions: { Advertise , Do Not

Example l l l Players: Smith and Louis Actions: { Advertise , Do Not Advertise } Payoffs: Companies’ Profits l Each firm earns $50 million from its customers Advertising costs a firm $20 million Advertising captures $30 million from competitor l How to represent this game? l l

Strategic Interactions Smith Louis No Ad Ad No Ad (50, 50) (20, 60) Ad

Strategic Interactions Smith Louis No Ad Ad No Ad (50, 50) (20, 60) Ad (60, 20) (30, 30)

Best Responses l Best response for Louis: l l l If Smith advertises: advertise

Best Responses l Best response for Louis: l l l If Smith advertises: advertise If Smith does not advertise: advertise The best response for Smith is the same. (Ad, Ad) is a dominant strategy! (Ad, Ad) is a NE! This is another Prisoners’ Dilemma! Smith No Ad Ad No Ad (50, 50) (20, 60) Ad (60, 20) (30, 30) Louis

Nash Equilibrium l l NE may be a pair of mixed strategies. Example: B

Nash Equilibrium l l NE may be a pair of mixed strategies. Example: B head Tail head (1, -1) (-1, 1) Tail (-1, 1) (1, -1) A Matching Pennies (1/2, 1/2) is the Nash Equilibrium.

Existence of NE l Theorem (J. Nash, 1950 s) For a finite game, there

Existence of NE l Theorem (J. Nash, 1950 s) For a finite game, there exists at least one Nash Equilibrium (Pure strategy, or mixed strategy).

Nash Equilibrium l NE may not be a good solution of the game, it

Nash Equilibrium l NE may not be a good solution of the game, it is different from the optimal solution. e. g. , Smith No Ad Ad No Ad (50, 50) (20, 60) Ad (60, 20) (30, 30) Louis

Nash Equilibrium l A game may have more than one NE. e. g. ,

Nash Equilibrium l A game may have more than one NE. e. g. , The Battle of Sex NE: (opera, opera), (football, football), ((2/3, 1/3), (1/3, 2/3)) Husband opera football opera (2, 1) (0, 0) football (0, 0) (1, 2) Wife

Nash Equilibrium l Zero sum games (two-person): Saddle point is a solution

Nash Equilibrium l Zero sum games (two-person): Saddle point is a solution

Nash Equilibrium l Many varieties of NE: Refined NE, Bayesian NE, Sub-game Perfect NE,

Nash Equilibrium l Many varieties of NE: Refined NE, Bayesian NE, Sub-game Perfect NE, Perfect Bayesian NE … l Finding NEs is very difficult. l NE can only tell us if the game reach such a state, then no player has incentive to change their strategies unilaterally. But NE can not tell us how to reach such a state.

Iterated Prisoner’s Dilemma

Iterated Prisoner’s Dilemma

Cooperation l Groups of organisms: l l l Mutual cooperation is of benefit to

Cooperation l Groups of organisms: l l l Mutual cooperation is of benefit to all agents Lack of cooperation is harmful to them Another types of cooperation: l l l Cooperating agents do well Any one will do better if failing cooperate Prisoner’s Dilemma is an elegant embodiment

Prisoner’s Dilemma l The story of prisoner’s dilemma Player: two prisoners Action: {Cooperation, Defecti}

Prisoner’s Dilemma l The story of prisoner’s dilemma Player: two prisoners Action: {Cooperation, Defecti} Payoff matrix Prisoner B C Prisoner A C D D (3, 3) (0, 5) (5, 0) (1, 1)

Prisoner’s Dilemma l l l No matter what the other does, the best choice

Prisoner’s Dilemma l l l No matter what the other does, the best choice is “D”. (D, D) is a Nash Equilibrium. But, if both choose “D”, both will do worse than if both select “C” Prisoner B C Prisoner A C D D (3, 3) (0, 5) (5, 0) (1, 1)

Iterated Prisoner’s Dilemma l The individuals: l l Meet many times Can recognize a

Iterated Prisoner’s Dilemma l The individuals: l l Meet many times Can recognize a previous interactant Remember the prior outcome Strategy: specify the probability of cooperation and defect based on the history l l P(C)=f 1(History) P(D)=f 2(History)

Strategies l Tit For Tat – cooperating on the first time, then repeat opponent's

Strategies l Tit For Tat – cooperating on the first time, then repeat opponent's last choice. Player A C D D C C C D D C… Player B D D C C C D D C…

Strategies l l l l l Tit For Tat - cooperating on the first

Strategies l l l l l Tit For Tat - cooperating on the first time, then repeat opponent's last choice. Tit For Tat and Random - Repeat opponent's last choice skewed by random setting. * Tit For Two Tats and Random - Like Tit For Tat except that opponent must make the same choice twice in a row before it is reciprocated. Choice is skewed by random setting. * Tit For Two Tats - Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated. Naive Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating. * Remorseful Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating. If the opponent defects in response to probing, show remorse by cooperating once. * Naive Peace Maker (Tit For Tat with Random Co-operation) - Repeat opponent's last choice (ie Tit For Tat), but sometimes make peace by cooperating in lieu of defecting. * True Peace Maker (hybrid of Tit For Tat and Tit For Two Tats with Random Cooperation) - Cooperate unless opponent defects twice in a row, then defect once, but sometimes make peace by cooperating in lieu of defecting. * Random - always set at 50% probability.

Strategies l l l Always Defect Always Cooperate Grudger (Co-operate, but only be a

Strategies l l l Always Defect Always Cooperate Grudger (Co-operate, but only be a sucker once) - Cooperate until the opponent defects. Then always defect unforgivingly. Pavlov (repeat last choice if good outcome) - If 5 or 3 points scored in the last round then repeat last choice. Pavlov / Random (repeat last choice if good outcome and Random) - If 5 or 3 points scored in the last round then repeat last choice, but sometimes make random choices. * Adaptive - Starts with c, c, c, d, d, d and then takes choices which have given the best average score re-calculated after every move. Gradual - Cooperates until the opponent defects, in such case defects the total number of times the opponent has defected during the game. Followed up by two co-operations. Suspicious Tit For Tat - As for Tit For Tat except begins by defecting. Soft Grudger - Cooperates until the opponent defects, in such case opponent is punished with d, d, c, c. Customised strategy 1 - default setting is T=1, P=1, R=1, S=0, B=1, always cooperate unless sucker (ie 0 points scored). Customised strategy 2 - default setting is T=1, P=1, R=0, S=0, B=0, always play alternating defect/cooperate.

Iterated Prisoner’s Dilemma l l The same players repeat the prisoner’s dilemma many times.

Iterated Prisoner’s Dilemma l l The same players repeat the prisoner’s dilemma many times. After ten rounds l l The best income is 50. A real case is to get 30 for each player. An extreme case is that each player selects “defection”, each player can get 10. The most possible case is that each player will play with a mixing strategy of “defect” and “cooperate”. Prisoner A C D C (3, 3) (0, 5) D (5, 0) (1, 1) Prisoner B

Iterated Prisoner’s Dilemma l Which strategy can thrive/what is the good strategy? Robert Axelrod,

Iterated Prisoner’s Dilemma l Which strategy can thrive/what is the good strategy? Robert Axelrod, 1980 s l A computer round-robin tournament l AXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA.

The first round Ø Strategies: 14 entries+ random strategy Including Markov process + Bayesian

The first round Ø Strategies: 14 entries+ random strategy Including Markov process + Bayesian inference Ø Ø Ø Each pair will meet each other, totally there are 15*15 runs, each pair will play the game 200 times Payoff: ∑S’ U(S, S’)/15 Tit For Tat wins (cooperation based on reciprocity)

The first round Naive Prober - Repeat opponent's last Ø Characters choice but sometimes

The first round Naive Prober - Repeat opponent's last Ø Characters choice but sometimes probe by defecting ofin“good” strategies lieu of cooperating Goodness: never defect first TFT vs. Naive prober Forgiveness: may revenge, but the memory is short. TFT vs. Grudger - Cooperate until the opponent defects. Then always defect unforgivingly

Winning Vs. High Scores l l This is not a zero sum game, there

Winning Vs. High Scores l l This is not a zero sum game, there is a banker. TFT never wins one game. The best result for it is to get the same result as its opponent. “Winning the game” is a kind of jealousness, it does not work well It is possible to arise “cooperation” in a “selfish” group.

The second round Ø Ø Ø Strategies: 62 entries+ random strategy Ø “goodness” strategies

The second round Ø Ø Ø Strategies: 62 entries+ random strategy Ø “goodness” strategies Ø “wiliness: strategies Tit For Tat wins again “Win” or “lost” depends on the circumstance.

Characters of “good” strategies l Goodness: never defect first l l l Forgiveness: may

Characters of “good” strategies l Goodness: never defect first l l l Forgiveness: may revenge, but the memory is short. l l l First round: the first eight strategies with “goodness” Second round: there are fourteen strategies with “goodness” in the first fifteen strategies “Grudger” is not s strategy with “forgiveness” “goodness” and “forgiveness” is a kind of collective behavior. For a single agent, defect is the best strategy.

Evolution of the Strategies l Evolve “good” strategies by genetic algorithm (GA)

Evolution of the Strategies l Evolve “good” strategies by genetic algorithm (GA)

What is a “good” strategy? l l l TFT is a good strategy? Tit

What is a “good” strategy? l l l TFT is a good strategy? Tit For Two Tats may be the best strategy in the first round, but it is not a good strategy in the second round. “Good” strategy depends on the environment. l. Tit For Two Tats - Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated. Evolutionarily stable strategy

Evolutionarily stable strategy (ESS) l l Introduced by John Maynard Smith and George R.

Evolutionarily stable strategy (ESS) l l Introduced by John Maynard Smith and George R. Price in 1973 ESS means evolutionarily stable strategy, that is “a strategy such that, if all member of the population adopt it, then no mutant strategy could invade the population under the influence of natural selection. ” John Maynard Smith, “Evolution and the Theory of Games” l ESS is robust for evolution, it can not be invaded by mutation.

Definition of ESS A strategy x is an ESS if for all y, y

Definition of ESS A strategy x is an ESS if for all y, y x, such that l holds for small positiveε.

ESS l ESS is defined in a population with a large number of individuals.

ESS l ESS is defined in a population with a large number of individuals. l The individuals can not control the strategy, and may not be aware the game they played l ESS is the result of natural selection l Like NE, ESS can only tell us it is robust to the evolution, but it can not tell us how the population reach such a state.

ESS in IPD l l l Tit For Tat can not be invaded by

ESS in IPD l l l Tit For Tat can not be invaded by the wiliness strategies, such as always defect. TFT can be invaded by “goodness” strategies, such as “always cooperate”, “Tit For Two Tats” and “Suspicious Tit For Tat ” Tit For Tat is not a strict ESS. “Always Cooperate” can be invaded by “Always Defect”. “Always Defect ” is an ESS.

references l Drew Fudenberg, Jean Tirole, Game Theory, The MIT Press, 1991. l AXELROD

references l Drew Fudenberg, Jean Tirole, Game Theory, The MIT Press, 1991. l AXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA. l Richard Dawkins, The Selfish Gene, Oxford University Press.

Concluding Remarks l Tip Of Game theory Basic Concepts l Nash Equilibrium l Iterated

Concluding Remarks l Tip Of Game theory Basic Concepts l Nash Equilibrium l Iterated Prisoner’s Dilemma l Evolutionarily Stable Strategy l

Concluding Remarks l Many interesting topics deserve to be studied and further investigated: l

Concluding Remarks l Many interesting topics deserve to be studied and further investigated: l l l Cooperative games Incomplete information games Dynamic games Combinatorial games Learning in games ….

Thank you!

Thank you!