Mark HasegawaJohnson 42021 CS 440ECE 448 Lecture 21

Game theory • Game theory deals with systems of interacting agents where the outcome

http: //www. economist. com/node/21527025

Games so far: Zero-sum two-player games… = game state from which MAX can play

Characteristics of games we’ve seen so far • Rational vs. Irrational opponent ✓ •

Games with Simultaneous Moves • Assume: two-player game, rational players, deterministic environment, but NOT

Outline of today’s lecture • Games with simultaneous moves: Notation • Example: Stag Hunt

Notation: sequential games • Terminal node is marked with the value for each player

Notation: simultaneous games L The payoff matrix shows: • Each column is a different

Alice Defect Cooperate Bob Stag hunt Defect Cooperate Photo by Scott Bauer, Public Domain,

Bob Nash Equilibrium Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia.

Pareto-optimal outcome Bob Alice Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons.

Asymmetric Coordination Games Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https:

What happens if they trust one another? Bob Alice Stag Alligator Photo by Scott

Pareto optimal outcome Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https:

Prisoner’s dilemma • Two criminals have been arrested and the police visit them separately

Questions that can be asked • If you were permitted to discuss options with

Pareto optimality If you were permitted to discuss options with the other player, what

Nash equilibrium If you knew in advance what your opponent was going to do,

Dominant strategy If you didn’t know in advance what your opponent was going to

What makes it a Prisoner’s Dilemma? We use that term to mean a game

Prisoner’s Dilemma vs. Stag Hunt Prisoner’s Dilemma Defect Cooperate Lose Big Win Win Big

Prisoner’s dilemma in real life • Price war • Arms race • Steroid use

Payoff matrices • Working for RAND (a defense contractor) in 1950, Flood and Dresher

Alice Game of Chicken Straight Chicken Player 2 Straight • Two players each bet

Prisoner’s Dilemma vs. Game of Chicken Prisoner’s Dilemma Defect Cooperate Lose Big Win Win

Alice Game of Chicken Straight Chicken Player 2 Straight • Is there a dominant

Irrational versus Random The game of chicken has two different types of Nash equilibria:

Alice Game of Chicken Straight Chicken Player 2 Straight Bob Player 1 Straight Chicken

Alice Game of Chicken Straight Chicken • Chicken Player 2 Bob Player 1 Straight

Finding mixed strategy equilibria Alice Bob Here’s the trick: for Bob, random selection is

Finding mixed strategy equilibria Bob • Alice

Does every game have a mixed-strategy equilibrium? •

Existence of Nash equilibria • Any game with a finite set of actions has

Using a neural net to generate synthetic data •

Database of natural examples of y • Random selection Generative Adversarial Network

• Random selection Database of natural examples of y

Reminder: Existence of Nash equilibria • Any game with a finite set of actions

Generator’s best response to the Discriminator’s dominant strategy •

Nash equilibrium for the generative adversarial network •

Slides: 55

Download presentation

Mark Hasegawa-Johnson, 4/2021 CS 440/ECE 448 Lecture 21: Game Theory CC-BY 4. 0: you may remix or redistribute if you cite the source. https: //en. wikipedia. org/wiki/Prisoner’s_dilemma

Game theory • Game theory deals with systems of interacting agents where the outcome for an agent depends on the actions of all the other agents • Applied in sociology, politics, economics, biology, and, of course, AI • Agent design: determining the best strategy for a rational agent in a given game • Mechanism design: how to set the rules of the game to ensure a desirable outcome

http: //www. economist. com/node/21527025

Games so far: Zero-sum two-player games… = game state from which MAX can play = game state from which MIN can play number = value of that game state for MAX

Characteristics of games we’ve seen so far • Rational vs. Irrational opponent ✓ • Two-player vs. Multi-player ✓ • Zero-sum vs. Non-zero-sum ✓ • Deterministic vs. Stochastic ✓ • Sequential vs. Simultaneous moves X

Games with Simultaneous Moves • Assume: two-player game, rational players, deterministic environment, but NOT necessarily zero-sum. • These assumptions are not necessary, but they simplify the problem. • Both players play at the same time. • What is the rational thing to do: 1. 2. 3. 4. If you know in advance what the other player will do? If you can negotiate your move with the other player? If you DON’T know in advance what the other player will do? If it is rational to behave randomly?

Outline of today’s lecture • Games with simultaneous moves: Notation • Example: Stag Hunt (Coordination Games) • Nash Equilibrium: Each player knows what the other will do, and responds rationally • Example: Asymmetric Coordination Games • Pareto Optimal outcome: No player can win more w/o some other player winning less • Example: Prisoners’ Dilemma (Betrayal Games) • Dominant Strategy: an action that is rational regardless of what the other player does • Example: Chicken (Anti-Coordination Games) • Factors external to the game: How well can you bluff? • Rational action within the game: Mixed Nash Equilibrium • Example: Generative Adversarial Networks (GAN)

Notation: sequential games • Terminal node is marked with the value for each player • Non-terminals node inherits value from its minimax-optimal descendant �� 7 �� 4 L R �� 7 �� 5 �� 4 4 L R R L �� 1 �� 5 �� 7 2 �� 1 4 �� 4

Notation: simultaneous games L The payoff matrix shows: • Each column is a different move for player 1. • Each row is a different move for player 2. • Each square is labeled with the rewards earned by each player in that square. �� R �� L R R L �� 1 �� 5 �� 7 2 �� 1 4 ��R L L 1 �� R 5 2 1 7 5 Payoff matrix 4 4

Alice Defect Cooperate Bob Stag hunt Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 10 0 10 100 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 Apparently first described by Jean-Jacques Rousseau: • If both hunters (Bob and Alice) cooperate in hunting for the stag → each gets to take home half a stag (100 lbs) • If one hunts for the stag, while the other wanders off and bags a hare → the defector gets a hare (10 lbs), the cooperator gets nothing. • If both hunters defect → each gets to take home a hare.

Bob Nash Equilibrium Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Alice Defect 10 0 10 10 Cooperate 10 100 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 A Nash Equilibrium is a game outcome such that each player, knowing the other player’s move in advance, responds rationally.

Bob Nash Equilibrium Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Alice Defect Cooperate 10 0 10 100 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 Example: (Defect, Defect) is a Nash equilibrium. • Alice knows that Bob will defect, so she defects. • Bob knows that Alice will defect, so he defects. • Neither player can rationally change his or her move, unless the other player also changes.

Bob Nash Equilibrium Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Alice Defect Cooperate 10 0 10 100 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 (Cooperate, Cooperate) is also a Nash equilibrium! • Alice knows that Bob will cooperate, so she cooperates! • Bob knows that Alice will cooperate, so she cooperates! • Neither player can rationally change his or her move, unless the other player also changes.

Pareto-optimal outcome Bob Alice Defect Cooperate Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Defect Cooperate 10 0 10 100 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 What if the players talk to each other in advance, and make promises, and trust one another’s promises? • Then they will both choose to cooperate.

Asymmetric Coordination Games Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Stag Alligator 20 0 10 20 Alice prefers alligator. Bob prefers stag. If they don’t cooperate, they each get nothing. By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449

Asymmetric Coordination Games Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Stag Alligator 20 0 10 20 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 The Nash equilibria are (Stag, Stag) and (Gator, Gator). • If Bob knows that Alice will hunt gator, then it’s rational for him to do the same. • If Alice knows that Bob will hunt stag, then it’s rational for him to do the same.

What happens if they trust one another? Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Stag Alligator 20 0 10 20 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 What happens if they discuss their actions, and make promises, and trust one another? It depends: whose needs are considered more important? • If Bob’s needs are more important, then they will hunt stag. • If Alice’s needs are more important, then they will hunt alligator.

Pareto optimal outcome Bob Alice Stag Alligator Photo by Scott Bauer, Public Domain, https: //commons. wikimedia. org/w/index. php? curid=245466 Stag Alligator 20 0 10 20 By Ancheta Wis, CC BY-SA 3. 0, https: //commons. wikimedia. org/w/index. php? curid=68 432449 An outcome is Pareto-optimal if the only way to increase value for one player is by decreasing value for the other. • (Stag, Stag) is Pareto-optimal: one could increase Alice’s value, but only by decreasing Bob’s value. • (Alligator, Alligator) is Pareto-optimal: one could increase Bob’s value, but only by decreasing Alice’s value.

Prisoner’s dilemma • Two criminals have been arrested and the police visit them separately • If one player testifies against the other and the other refuses, the one who testified goes free and the one who refused gets a 10 year sentence • If both players testify against each other, they each get a 5 year sentence • If both refuse to testify, they each get a 1 -year sentence Alice: Testify Bob: Refuse By Monogram Pictures, Public Domain, https: //commons. wikimedi a. org/w/index. php? curid=5 0338507 Alice: Refuse

Prisoner’s dilemma • Two criminals have been arrested and the police visit them separately • If one player testifies against the other and the other refuses, the one who testified goes free and the one who refused gets a 10 year sentence • If both players testify against each other, they each get a 5 year sentence • If both refuse to testify, they each get a 1 -year sentence Alice: Testify Bob: Refuse -5 -10 By Monogram Pictures, Public Domain, https: //commons. wikimedi a. org/w/index. php? curid=5 0338507 Alice: Refuse -5 0 0 -1 -10 -1

Questions that can be asked • If you were permitted to discuss options with the other player, but if one of you is more persuasive than the other, what are the different possible outcomes that might result from that discussion? • If you knew in advance what your opponent was going to do, what would you do? • If you didn’t know in advance what your opponent was going to do, what would you do?

Pareto optimality If you were permitted to discuss options with the other player, what are the different possible outcomes that might result from that discussion? • If Bob’s needs are considered most important, the (10, 0) outcome might result. • If Alice’s needs are considered more important, the (0, -10) outcome might result. • If their needs are equally important, the (-1, -1) outcome might result. A Pareto optimal outcome is an outcome whose cost to player A can only be reduced by increasing the cost to player B. Alice: Testify Bob: Refuse Alice: Refuse -5 -5 -10 By Monogram Pictures, Public Domain, https: //co mmons. wi kimedia. or g/w/index. php? curid= 50338507 0 0 -1 -10 -1

Nash equilibrium If you knew in advance what your opponent was going to do, what would you do? • If Bob knew that Alice was going to refuse, then it be rational for Bob to testify (he’d get 0 years, instead of 1). • If Alice knew that Bob was going to testify, then it would be rational for her to testify (she’d get 5 years, instead of 10). • If Bob knew that Alice was going to testify, then it would be rational for him to testify (he’d get 5 years, instead of 10). A Nash equilibrium is an outcome such that foreknowledge of the other player’s action does not cause either player to change their action. Alice: Testify Bob: Refuse Alice: Refuse -5 -5 -10 By Monogram Pictures, Public Domain, https: //co mmons. wi kimedia. or g/w/index. php? curid= 50338507 0 0 -1 -10 -1

Dominant strategy If you didn’t know in advance what your opponent was going to do, what would you do? • If Bob knew that Alice was going to refuse, then it be rational for Bob to testify (he’d get 0 years, instead of 1). • If Bob knew that Alice was going to testify, then it would still be rational for him to testify (he’d get 5 years, instead of 10). A dominant strategy is an action that minimizes cost, for one player, regardless of what the other player does. Alice: Testify Bob: Refuse Alice: Refuse -5 -5 -10 By Monogram Pictures, Public Domain, https: //co mmons. wi kimedia. or g/w/index. php? curid= 50338507 0 0 -1 -10 -1

What makes it a Prisoner’s Dilemma? We use that term to mean a game in which • Defecting is the dominant strategy for each player, therefore • (Defect, Defect) is the only Nash equilibrium, even though • (Defect, Defect) is not a Paretooptimal solution. Defect Cooperate http: //en. wikipedia. org/wiki/Prisoner’s_dilemma Lose Cooperate Lose Big Win Win Big Lose Big Win

Prisoner’s Dilemma vs. Stag Hunt Prisoner’s Dilemma Defect Cooperate Lose Big Win Win Big Lose Big Win Players improve their winnings by defecting unilaterally Defect Cooperate Lose Win Win Lose Win Big Players reduce their winnings by defecting unilaterally

Prisoner’s dilemma in real life • Price war • Arms race • Steroid use • Diner’s dilemma Defect Cooperate http: //en. wikipedia. org/wiki/Prisoner’s_dilemma Lose Cooperate Lose Big Win Lose Draw Win Lose Big Draw

Payoff matrices • Working for RAND (a defense contractor) in 1950, Flood and Dresher formalized the “Prisoner’s Dilemma” (PD): a class of payoff matrices that encourages betrayal. • Jean-Jacques Rosseau (Swiss philosopher, 1700 s) invented the “Stag Hunt” (SH): a class of payoff matrices that reward cooperation, but don’t force it. Has been used as a model of climate-change treaties. • Both PD and SH have stable Nash equilibria. The “Game of Chicken” is a popular subject in movies (Rebel Without a Cause, Footloose, Crazy Rich Asians) because of its inherent instability: the only way to win is by convincing your opponent to lose.

Alice Game of Chicken Straight Chicken Player 2 Straight • Two players each bet $1000 that the other player will chicken out • Outcomes: • If one player chickens out, the other wins $1000 • If both players chicken out, neither wins anything • If neither player chickens out, they both lose $10, 000 (the cost of the car) Straight Bob Player 1 Straight Chicken http: //en. wikipedia. org/wiki/Game_of_chicken -10 -1 -10 1 1 0 -1 0

Prisoner’s Dilemma vs. Game of Chicken Prisoner’s Dilemma Defect Cooperate Lose Big Win Win Big Lose Big Game of Chicken Win Players cut their losses by defecting if the other player defects Straight Chicken Lose Big Win Win Big Lose Win Defecting, if the other player defects, is the worst thing you can do

Alice Game of Chicken Straight Chicken Player 2 Straight • Is there a dominant strategy for either player? • Is there a Nash equilibrium? Bob Player 1 Straight Chicken (straight, chicken) or (chicken, straight) -10 -1 • Anti-coordination game: it is mutually beneficial for the two players to choose different strategies • Model of escalated conflict in humans and animals (hawk-dove game) • How are the players to decide what to do? • Bluff! You have to somehow convince your opponent that you will drive straight, no matter what happens, even if it’s irrational for you to do so. • In that case, the rational thing for your opponent to do is to chicken out. http: //en. wikipedia. org/wiki/Game_of_chicken -10 1 1 0 -1 0

Alice Game of Chicken Straight Chicken Player 2 Straight • Is there a dominant strategy for either player? • Is there a Nash equilibrium? Straight Bob Player 1 Straight Chicken (straight, chicken) or (chicken, straight) -10 -1 • Anti-coordination game: it is mutually beneficial for the two players to choose different strategies • Model of escalated conflict in humans and animals (hawk-dove game) • How are the players to decide what to do? • Bluff! You have to somehow convince your opponent that you will drive straight, no matter what happens, even if it’s irrational for you to do so. • In that case, the rational thing for your opponent to do is to chicken out. http: //en. wikipedia. org/wiki/Game_of_chicken -10 1 1 0 -1 0 Seriously? ? !! Is there no way to win this game without convincing the other player that you are irrational? ? !!

Irrational versus Random The game of chicken has two different types of Nash equilibria: • Bluff. One player convinces the other that he will behave irrationally. The other player concedes the game. Result: (straight, chicken) or (chicken, straight). • Mixed Nash Equilibrium. • Alice chooses a move at random, according to some probability distribution. She tells Bob, in advance, what probability distribution she will use. • Bob responds rationally. • One of Bob’s rational options is to choose his move, also, at random.

Alice Game of Chicken Straight Chicken Player 2 Straight Bob Player 1 Straight Chicken -10 -1 -10 1 1 0 • Mixed strategy: a player chooses between the different possible actions according to a probability distribution. • For example, suppose that each player chooses to go straight (S) with probability 1/10. Is that a Nash equilibrium? -1 0

Alice Game of Chicken Straight Chicken • Chicken Player 2 Bob Player 1 Straight -10 -1 -10 1 1 0 -1 0

Finding mixed strategy equilibria Alice Bob Here’s the trick: for Bob, random selection is rational only if he can’t improve his winnings by definitively choosing one action or the other. So, for Bob to decide whether a mixed strategy is rational, he needs to know: • His own reward for each possible outcome (w, x, y, and z), and … • the probability (p) of Alice cooperating.

Finding mixed strategy equilibria Bob • Alice

Does every game have a mixed-strategy equilibrium? •

Existence of Nash equilibria • Any game with a finite set of actions has at least one Nash equilibrium (which may be a mixed-strategy equilibrium). • If a player has a dominant strategy, there exists a Nash equilibrium in which the player plays that strategy and the other plays the best response to that strategy. • If both players have dominant strategies, there exists a Nash equilibrium in which they play those strategies.

Using a neural net to generate synthetic data •

Database of natural examples of y • Random selection Generative Adversarial Network

• Random selection Database of natural examples of y

Dominant strategy •

Reminder: Existence of Nash equilibria • Any game with a finite set of actions has at least one Nash equilibrium (which may be a mixed-strategy equilibrium). • If a player has a dominant strategy, there exists a Nash equilibrium in which the player plays that strategy and the other plays the best response to that strategy. • If both players have dominant strategies, there exists a Nash equilibrium in which they play those strategies.

Generator’s best response to the Discriminator’s dominant strategy •

Nash equilibrium for the generative adversarial network •

Example

Summary •