Chapter 11 Multi Agent Interactions Each agent has

  • Slides: 57
Download presentation
Chapter 11 Multi. Agent Interactions • Each agent has preferences. Each agent gets utility

Chapter 11 Multi. Agent Interactions • Each agent has preferences. Each agent gets utility depending on their choices and the choices of the other • We can write this set of two utilities as a payoff matrix as follows. 1

Pay. Off Matrices - Business plan. Success depends on other’s choices. - For simplicity,

Pay. Off Matrices - Business plan. Success depends on other’s choices. - For simplicity, consider two players. - I make a choice, but the consequences of that choice depends on what you do. - Let c(a 1, a 2) denote the consequence that results when I (agent 1) choose action a 1 and you (agent 2) choose action a 2 - Let the utility of that consequence be u 1[c(a 1, a 2)], we often abbreviate this as u 1(a 1, a 2). - There is a sound mathematical theory called Game Theory for dealing with multi-agent choice problems when every agent knows its utilities and the utilities of all other agents. - the field of game theory came into being with the 1944 book Theory of Games and Economic Behavior by John von Neumann and Oskar Morgenstern. 1930 some early work began. 2

Normal Form • When outcomes depend only on a single choice by me and

Normal Form • When outcomes depend only on a single choice by me and a single choice by you, then the game is said to be in normal form. This terminology represents the idea that the game is in a normative form, meaning a canonical form. • The payoff matrix expresses games in normal form, but the game tree expresses games in extensive formrepresenting turn taking. It will probably be helpful to associate the phrase extensive form with the notion of a tree, and the phrase normal form with the notion of a payoff matrix. 3

Strategy • Game theorists can reason about strategies, based on contingencies, rather than discrete

Strategy • Game theorists can reason about strategies, based on contingencies, rather than discrete actions. • In this context, a strategy is not an attitude like play aggressively or defend the goal, but rather a complete expression of what to do in every contingency. As described by Poundstone, – [A strategy] is a complete description of a particular way to play a game, no matter what the other player(s) does and no matter how long the game lasts. A strategy must prescribe actions so thoroughly that you never have to make a decision in following it. • A strategy is defined as a choice made in response to the previous choices made in the game. 4

Prisoners’ dilemma – damaged property Ned Confess Don’t Confess Both get large fines Ned

Prisoners’ dilemma – damaged property Ned Confess Don’t Confess Both get large fines Ned is suspended Kelly gets wrist slapped Kelly is suspended Ned gets wrist slapped Both get minimal fines Confess Kelly Don’t Confess 5

Prisoners’ dilemma Ned Don’t Confess (defect) Confess Kelly (cooperate) (defect) 2, 2 5, 0

Prisoners’ dilemma Ned Don’t Confess (defect) Confess Kelly (cooperate) (defect) 2, 2 5, 0 Don’t Confess 0, 5 3, 3 (cooperate) 6

Solution Concepts • Three different approaches: 1. minimax, maximin 2. Nash equilibria 3. Pareto

Solution Concepts • Three different approaches: 1. minimax, maximin 2. Nash equilibria 3. Pareto optimal Maximin: look at worst case scenario. Pick result that maximizes the worst case Confess (defect) Not confess (cooperative) Results 2 or 5 0 or 3 Worst Case 2 0 7

Best Response Suppose for a moment that I am P 1 and that I

Best Response Suppose for a moment that I am P 1 and that I know what P 2 will choose. Given that I know P 2's choice, I can search through all of my choices and find the option that is the best response to his choice. For the prisoner's dilemma, if P 2 chooses to confess then the option that maximizes my payoff is to confess. Similarly, if P 2 chooses to not confess then the option that maximizes my payoff is to confess. Best response – simply the best thing I can do GIVEN I know what you will do. BUT, I don’t know what you will do. It has to do with regret. This prisoners dilemma problem makes it easy – as I pick the same thing either way. 8

Prisoners’ dilemma – Best Response Ned Don’t Confess (defect) Confess Kelly (cooperate) (defect) 2,

Prisoners’ dilemma – Best Response Ned Don’t Confess (defect) Confess Kelly (cooperate) (defect) 2, 2 5, 0 Don’t Confess 0, 5 3, 3 (cooperate) 9

Best Response Suppose that instead of knowing exactly which choices P 2 was going

Best Response Suppose that instead of knowing exactly which choices P 2 was going to make, I know only that P 2 will play confess, say, 40% of the time and not confess 60% of the time. I can find the expected utility for confessing (. 4*(2)+. 6*(5) = 3. 8 and the expected utility for not confessing (. 4*(0)+. 6*(3) = 1. 8), which tells me that my best response to P 2's strategy is to confess. The notion of a best response is very useful to help understand other solution concepts from multi-agent choice. For example, the maximin solution can be viewed as the best-response (i. e. , the maximal payoff) when I believe that the other player will always be able to choose thing that hurts me the most. 10

Prisoners’ dilemma Note that no matter what Ned does, Kelly is better off if

Prisoners’ dilemma Note that no matter what Ned does, Kelly is better off if she confesses than if she does not confess. So ‘confess’ is a dominant strategy from Kelly’s perspective. We can predict that she will always confess. Confess Ned Confess Don’t Confess 2, 2 5, 0 0, 5 3, 3 Kelly Don’t Confess 11

Nash Equilibrium • Suppose you decide to look at the best choice for both

Nash Equilibrium • Suppose you decide to look at the best choice for both of you. Clearly, ”not confess” is best for both. • Assuming both of you pick that “best for both solution” – You observe that if you confess, your utility improves. – You observe that if the other player confesses, HIS utility improves. – You notice that if you both confess, neither player has the motivation to change his mind. • A Nash equilibrium is a set of solutions where every player's choice is a best response to every other player's choice. In other words, the equilibrium solution is made up of a set of individual choices that make up a joint action from which neither benefits by deviating. 12

Nash Equilibrium – (Beautiful Mind) In general, we will say that two strategies s

Nash Equilibrium – (Beautiful Mind) In general, we will say that two strategies s 1 and s 2 are in Nash equilibrium if: 1. under the assumption that agent i plays s 1, agent j can do no better than play s 2; and 2. under the assumption that agent j plays s 2, agent i can do no better than play s 1. • Neither agent has any incentive to deviate from a Nash equilibrium- it is stable • Unfortunately: 1. Not every interaction scenario has a Nash equilibrium 2. Some interaction scenarios have more than one Nash equilibrium • 13

Definitions • My strategy can be anything that gives a precise course of action:

Definitions • My strategy can be anything that gives a precise course of action: – Always confess – Always NOT confess – Confess with probability. 25 – For a repeated game, if the opponent confessed last time, confess this time. • Constant sum: Each outcome sums to same thing. 2, 2 4, 0 0, 4 3, 1 14

Criteria for evaluating systems • Social welfare: pick outcome whose sum of utilities is

Criteria for evaluating systems • Social welfare: pick outcome whose sum of utilities is highest • Surplus: social welfare of outcome – social welfare of status quo – Constant sum games have 0 surplus (as all options sum to same total). • Pareto efficiency: An outcome o is Pareto efficient if there exists no other outcome o’ such that some agent has higher utility in o’ than in o and no agent has lower utility – if we maximize social welfare we have pareto optimal, but not vice versa – Not a very useful way of selecting strategies • Individual rationality: Agent do what is best for it • Stability: No agents can increase their utility by changing their strategies (given everyone else keeps the same strategy). If I knew what my opponent would do, I am still be satisfied with my decision. • Symmetry: I get same utility as you if our roles were reversed. • No dictator: no agent is inherently preferred. 15

Example: Prisoner’s Dilemma • Two people arrested for a crime. If neither suspect confesses,

Example: Prisoner’s Dilemma • Two people arrested for a crime. If neither suspect confesses, both get light sentence. If both confess, then they get sent to jail. If one confesses and the other does not, then the confessor gets no jail time and the other gets a heavy sentence. • (Actual numbers vary in different versions of the problem, but relative values are the same) Dominant Strategy Equil not pareto optimal Confess Don’t Confess Pareto optimal Confess 2, 2 0, 5 Don’t Confess 5, 0 3, 3 Maximize social welfare 16

Pareto Optimal • Both maximin and Nash equilibrium solutions are pessimistic; they frame the

Pareto Optimal • Both maximin and Nash equilibrium solutions are pessimistic; they frame the problem as a competitive game between my interests and your interests. • This competition is not healthy because they both produce a solution (to prisoners dilemma) with both confessing, and both receiving next to least favorite outcome of all possibilities (5, 3, 2, 0). Plus, the group utility (=4) is the lowest of any option. • The idea behind Pareto optimality is that some joint solutions should obviously be avoided because they are bad for everyone. 17

Consider as ordinal (position) rather than cardinal (how much) values. We can change the

Consider as ordinal (position) rather than cardinal (how much) values. We can change the actual values without changing the interpretation. Defect Cooperate 2, 2 4, 1 1, 4 3, 3 18

Where is pareto frontier A C B D 19

Where is pareto frontier A C B D 19

Where is pareto frontier A C B D 20

Where is pareto frontier A C B D 20

The term pareto efficient… • The term pareto efficient is named after Vilfredo Pareto,

The term pareto efficient… • The term pareto efficient is named after Vilfredo Pareto, an Italian economist who used the concept in his studies of economic efficiency and income distribution. • He is also the one credited with the 80/20 rule to describe the unequal distribution of wealth in his country, observing that twenty percent of the people owned eighty percent of the wealth. • If an economic system is not Pareto efficient, then it is the case that some individual can be made better off without anyone being made worse off. It is commonly accepted that such inefficient outcomes are to be avoided, and therefore Pareto efficiency is an important criterion for evaluating economic systems and political policies. 21

Strategic Dominance • One of my actions is better than my other choices regardless

Strategic Dominance • One of my actions is better than my other choices regardless of what my opponent picks. (Note: example is a non-symmetric game) P 1 is always better off by playing C Is there a dominant strategy for P 2? (No, why? ) A strategically dominant solution is a solution which is a best response for every possible choice made by the other players. What should P 2 do? 22

Satisficing Equilibrium • There are other, non-traditional solution concepts that are relevant for multi-agent

Satisficing Equilibrium • There are other, non-traditional solution concepts that are relevant for multi-agent games. • One of these solution concepts is the notion of satisficing equilibrium. • The word satisfice means to strive for something that is sufficient. A satisficing equilibrium occurs when agents have arrived at choices such that the consequence produced by these choices yields utilities that all agents are satisfied with. • It is an equilibrium because a satisficing agent is content with a non-optimal solution if it s good enough. If all agents are content, no agent has an incentive to change its actions, which means that the solution is stable. • In real world, may not know (or have resources to evaluate) all options, but can make a decision knowing that it is “good enough” 23

At seats: Show in normal formwrestling • there is a widespread practice in high

At seats: Show in normal formwrestling • there is a widespread practice in high school wrestling where the participants intentionally lose unnaturally large amounts of weight so as to change weight class (and compete against lighter opponents). • In doing so, the participants are clearly not at their top level of physical and athletic fitness and yet often end up competing against the same opponents anyway, who have also followed this practice (mutual defection). • The result is a reduction in the level of competition. • Come up with a normal form representation of the game using any values you like. We show what could happen with two players whose choices are lose weight or not. • I broke it down into two pieces: inconvenience of fasting and competitive advantage. 24

Utility: inconvenience (0, -1) + competitive advantage (-3, 3) Losing to compete Compete at

Utility: inconvenience (0, -1) + competitive advantage (-3, 3) Losing to compete Compete at normal weight -1, -1 2, -3 -3, 2 0, 0 So what is best option? 25

Game of Chicken • Consider another type of encounter — the game of chicken:

Game of Chicken • Consider another type of encounter — the game of chicken: Ned Kelly straight swerve -10, -10 5, 10 10, 5 7, 7 • (Think of James Dean in Rebel without a Cause) • Difference from prisoner’s dilemma: Mutually going straight is most feared outcome. (Whereas sucker’s payoff (not confessing when opponent does) is most feared in prisoner’s dilemma. ) 26

Game of Chicken straight swerve -10, -10 5, 10 10, 5 7, 7 •

Game of Chicken straight swerve -10, -10 5, 10 10, 5 7, 7 • Is there a dominant strategy? • Is there a pareto optimal (can’t do better without making someone worse)? • Is there a “Nash” equilibrium – knowing what my opponent is going to do, would I be happy with my decision? 27

Try this one coop defect 5, 5 0, 0 10, 10 • Is there

Try this one coop defect 5, 5 0, 0 10, 10 • Is there a dominant strategy? • Is there a pareto optimal (can’t do better without making someone worse)? • Is there a “Nash” equilibrium – knowing what my opponent is going to do, would I be happy with my decision? 28

And this one coop defect 1, 0 3, 3 4, 3 5, 2 •

And this one coop defect 1, 0 3, 3 4, 3 5, 2 • Is there a dominant strategy? • Is there a pareto optimal (can’t do better without making someone worse)? • Is there a “Nash” equilibrium – knowing what my opponent is going to do, would I be happy with my decision? 29

And this one coop defect 1, 0 3, 3 4, 3 5, 2 •

And this one coop defect 1, 0 3, 3 4, 3 5, 2 • Is there a dominant strategy? • Is there a pareto optimal (can’t do better without making someone worse)? • Is there a “Nash” equilibrium – knowing what my opponent is going to do, would I be happy with my decision? 30

Free Rider • described by Poundstone • It's late at night, and there's no

Free Rider • described by Poundstone • It's late at night, and there's no one in the subway station. Why not just hop over the turnstiles and save yourself the fare? But remember, if everyone hopped the turnstiles, the subway system would go broke, and no one would be able to get anywhere. What's the chance that your lost fare will bankrupt the subway system? Virtually zero. The trains run whether the cars are empty or full. In no way does an extra passenger increase the system's operating expenses. But if everybody thinks this way Try creating normal form game. What are the players? 31

Normal form game* (free rider) Pay All others Not Pay Action Pay Outcome 0,

Normal form game* (free rider) Pay All others Not Pay Action Pay Outcome 0, 0 0, 1 Payoffs Agent 1 Not Pay 1, 0 -1, -1 *aka strategic form, matrix form 32

Normal form game* (matching pennies) H Agent 2 T Action H Outcome -1, 1

Normal form game* (matching pennies) H Agent 2 T Action H Outcome -1, 1 1, -1 Payoffs Agent 1 T 1, -1 -1, 1 *aka strategic form, matrix form 33

Extensive form game (matching pennies) Player 2 doesn’t know what has been played so

Extensive form game (matching pennies) Player 2 doesn’t know what has been played so he doesn’t know which node he is at. Player 1 Action T H Player 2 H Terminal node (outcome) (-1, 1) T (1, -1) H T (1, -1) Payoffs (player 1, player 2) (-1, 1) 34

 • Strategy: Strategies – A strategy, sj, is a complete contingency plan; defines

• Strategy: Strategies – A strategy, sj, is a complete contingency plan; defines actions which agent j should take for all possible states of the world. In these simple games, the state is always “the beginning”. A strategy might look like: (1) always pick A or (2) Pick A or B with probability p and (1 -p). • Strategy profile: s=(s 1, …, sn) – what each agent did (assuming n players). – s-i = (s 1, …, si-1, si+1, …, sn) - what everyone else did • Utility function: ui(s) – Note that the utility of an agent depends on the strategy profile, not just its own strategy – We assume agents are expected utility maximizers 35

Normal form game* (matching pennies) H H Agent 2 T -1, 1 1, -1

Normal form game* (matching pennies) H H Agent 2 T -1, 1 1, -1 -1, 1 Agent 1 T *aka strategic form, matrix form Strategy for agent 1: H Strategy for agent 2: T Strategy profile (H, T) U 1((H, T))=1 U 2((H, T))=-1 36

Suppose you knew the other person was switching strategies based on probability p •

Suppose you knew the other person was switching strategies based on probability p • For matching pennies, what would you do? 37

Here is an example of the best response function for probabilistic actions. Notice y’s

Here is an example of the best response function for probabilistic actions. Notice y’s response isn’t probabilistic • Given x, what should y do? • • • First Graph: The x coordinate indicates the fraction of the time the row player picks heads. The y coordinate indicates the fraction of the time the column player picks heads. The y player wins with matching. The first graph indicates that if the row player picks heads less than. 5 of the time, the column player should pick heads 0 percent of the time. This means that the majority of the time, the outcome is in T, T (giving the highest utility for the column player. When the column player is mixing choices equally, it doesn’t matter than the row player does in response. When the row player picks heads more than half of the time, the column player should pick heads all of the time, so the outcome H, H would be most likely. 38

Here is an example of the best response function • • • Second Graph:

Here is an example of the best response function • • • Second Graph: This time we look at the best response to what the column player does. The y coordinate indicates the fraction of the time the column player picks heads. The x coordinate indicates the fraction of the time the row player picks heads in response. The first graph indicates that if the column player picks heads less than. 5 of the time, the row player should pick heads all of the time. This means that the majority of the time, the outcome is in H, T (giving the highest utility for the row player. When the column player is mixing choices equally, it doesn’t matter than the row player does in response. When the column player picks head more than half of the time, the row player should pick heads none of the time, so the outcome T, H would be most likely. 39

Here is an example of the best response function • The third graph puts

Here is an example of the best response function • The third graph puts the other two together, indicating the mixed strategy equilibrium is when both are mixing optimally. 40

Battle of the Sexes Consider Battle of the Sexes. In the game, a husband

Battle of the Sexes Consider Battle of the Sexes. In the game, a husband wife must independently decide on a date activity. The husband would prefer one form of entertainment, say fishing, and the wife would prefer another form of entertainment, say shopping. Although both have a most preferred activity, both prefer being together to being alone. Wife choice Fishing Shopping Fishing Husband choice Shopping 2, 2 4, 3 3, 4 1, 1 41

Maximin Wife Looks at worst option for her. Pick solution with maximizes Wife preference

Maximin Wife Looks at worst option for her. Pick solution with maximizes Wife preference Worst Choice Fishing Husband preference Shopping 2 3 4 1 42

Reactions • Maximin – both should be selfish (and do what they want). •

Reactions • Maximin – both should be selfish (and do what they want). • Not a great solution as if either “defects” (from the selfish choice) makes both happier. (Not Nash equilibrium) • Is better than worst case of (1, 1) • In fact, either consequence that results when one is selfish and the other unselfish dominates (in the Pareto-optimal sense) the maximin solution, and both of these consequences are in equilibrium (since neither player benefits by unilaterally changing his/her mind). Unfortunately, with no way to communicate the players are left with making independent choices to try and reach an equilibrium. 43

Reaction? • If we had an external way of telling both what to do,

Reaction? • If we had an external way of telling both what to do, that would be optimal. • Could just pick a randomized strategy for each player – Termed mixed strategy. How do you think that would work?

Battle of the Sexes Consider Battle of the Sexes. In the game, a husband

Battle of the Sexes Consider Battle of the Sexes. In the game, a husband wife must independently decide on a date activity. The husband would prefer one form of entertainment, say fishing, and the wife would prefer another form of entertainment, say shopping. Although both have a most preferred activity, both prefer being together to being alone. Wife choice Fishing Shopping Fishing Husband choice Shopping 2, 2 4, 3 3, 4 1, 1 45

Battle of the sexes (cont) • Using a mixed strategy (picking each option a

Battle of the sexes (cont) • Using a mixed strategy (picking each option a fraction of the time) does not help their chances. • In fact, the expected payoffs for two independent mixed strategies are pretty bad; if both randomly choose, the expected payoff is only 2. 5 --- not much better than the maximin value. • For your information, the set of possible payoffs for all possible combinations of mixed strategies are illustrated below. 46

Shows how various combinations of mixed strategies interact. Randomly pick a mixed strategy for

Shows how various combinations of mixed strategies interact. Randomly pick a mixed strategy for each, then plot the result. 47

Battle of the Sexes Suppose husband picks fishing 2/3 of the time and wife

Battle of the Sexes Suppose husband picks fishing 2/3 of the time and wife picks 2/3 shopping. Wife preference Shopping 2/3 Fishing 2/3 Husband preference Shopping 1/3 2, 2 (4/9) 3, 4 (2/9) utility: 2*4/9 + 4*2/9 + 3*2/9 + 1(1/9) = 2. 56 Fishing 1/3 4, 3 (2/9) 1, 1 (1/9) 48

Sometimes we need to remove a choice coop defect 1, 0 3, 3 4,

Sometimes we need to remove a choice coop defect 1, 0 3, 3 4, 5 5, 2 • Is there an option that we can remove? 49

What if there are more than two choices? A simple competition game Note –

What if there are more than two choices? A simple competition game Note – no player has a dominant strategy. Pierce High Donna Medium Low 60, 60 36, 70 36, 35 70, 36 50, 50 30, 35 35, 36 35, 30 25, 25 Low 50

What if there are more than two choices? A simple competition game Note –

What if there are more than two choices? A simple competition game Note – no player has a dominant strategy. But low is dominated for both players. So we can predict that neither will play low. Remove it. Donna Pierce High Medium Low 60, 60 36, 70 36, 35 70, 36 50, 50 30, 35 35, 36 35, 30 25, 25 Low 51

Formally: Iterated Elimination of Dominated Strategies • Let Ri Si be the set of

Formally: Iterated Elimination of Dominated Strategies • Let Ri Si be the set of removed strategies for agent i • Initially Ri=Ø • Choose agent i, and strategy si such that si SiRi (Si subtract Ri) and there exists si’ SiRi such that ui(si’, s-i)>ui(si, s-i) for all s-i S-iR-i • Add si to Ri, continue • Theorem: If a unique strategy profile, s*, survives iterated elimination, then it is a Nash Eq. • Theorem: If a profile, s*, is a Nash Eq then it must survive iterated elimination. 52

A simple competition game Once we have removed low, medium is now a dominant

A simple competition game Once we have removed low, medium is now a dominant strategy for both. So we predict that both Pierce and Donna will play medium. Donna Pierce High Medium 60, 60 36, 70 70, 36 50, 50 Low 53

Example – Zero Sum (most vicious) (We divide the same cake. If I lose,

Example – Zero Sum (most vicious) (We divide the same cake. If I lose, you win. ) Bi matrix form (show utilities separately each player) • Cake slicing • Two players –cutter –chooser Cutter's Utility Choose bigger piece smaller piece Cut cake evenly ½ - a bit ½ + a bit Cut unevenly Small piece Big piece Chooser's Utility Choose bigger piece smaller piece Cut cake evenly ½ + a bit ½ - a bit Cut unevenly Big piece Small piece 54

Zero Sum • Scientists debate whether zero sum scenarios really exist. • However, many

Zero Sum • Scientists debate whether zero sum scenarios really exist. • However, many TREAT situations as if they did. 55

Rationality • Rationality Choose bigger piece Choose smaller piece Cut cake evenly (-1, +1)

Rationality • Rationality Choose bigger piece Choose smaller piece Cut cake evenly (-1, +1) (+1, -1) Cut unevenly (-10, +10) (+10, -10) –each player will take highest utility option –taking into account the other player's likely behavior • In example –if cutter cuts unevenly • he might like to end up in the lower right • but the other player would never do that –-10 –if cuts evenly, • he will end up in the upper left –-1 • this is a stable outcome –neither player has an incentive to deviate 56

Other Symmetric 2 x 2 Games • Given the 4 possible outcomes of (symmetric)

Other Symmetric 2 x 2 Games • Given the 4 possible outcomes of (symmetric) cooperate/defect games, there are 24 possible orderings on outcomes (showing preference for first player). Here a few of them: – CC CD DC DD Cooperation dominates – DC DD CC CD Defect dominates Deadlock. You will always do best by defecting – DC CC DD i CD Prisoner’s dilemma – DC CD DD Chicken – CC DD CD Stag hunt 57