SocialPrisoners dilemma Lecture 3 Payoff matrix P Punishment

  • Slides: 19
Download presentation
Social/Prisoner’s dilemma Lecture 3 Payoff matrix: P: Punishment for mutual defection R: Reward of

Social/Prisoner’s dilemma Lecture 3 Payoff matrix: P: Punishment for mutual defection R: Reward of cooperation T: Temptation to choose defection S: Sucker’s payoff Rank of payoff: T > R > P > S; NE: DD For repeated games we assume, that 2 R>T+S 1

Social/Prisoner’s dilemma Lecture 3 Payoff matrix: P: Punishment for mutual defection R: Reward of

Social/Prisoner’s dilemma Lecture 3 Payoff matrix: P: Punishment for mutual defection R: Reward of cooperation T: Temptation to choose defection S: Sucker’s payoff Rank of payoff: T > R > P > S; NE: DD For repeated games we assume, that 2 R>T+S Real life situations: disarmament negotions selection between fair and unfair behavior to work correctly or not to learn all skills requested to do my job perfectly or not to guard the environment or not to bring alcohol to a party or not to vaccinate myself (against flu) or not to use direction signal (in car) or not to help tourists or foreigners or not 2

Curious feature of Prisoner’s dilemma was discovered by Merrill Flood (1950). The name comes

Curious feature of Prisoner’s dilemma was discovered by Merrill Flood (1950). The name comes from Albert Thucker Many questions: Are we rational? Are we selfish? The dilemma attacks the basis of capitalism (‘invisible hand’ by Adam Smith) Can we apply game theory to describe economic, human behavior, etc? . . . Experiments (with Melvin Dreshel at the RAND Corporation): Questionary/statistics of the price of cars Price = (buying price + selling price)/2 Electronic instrument with 2 x 2 buttons to play PD reactions are noted General believes (Folk Theorem): Repeatition involves further possibilities resolving the dilemma At the end of ‘ 70 s only a few researchers have idea about how to handle these dilemmas 3

Computer tournament conducted by Robert Axelrod (1979 -1981) N players play repeated PD game

Computer tournament conducted by Robert Axelrod (1979 -1981) N players play repeated PD game against each other (round robin game) After M rounds the winner is the one who has the highest average payoff (M→∞) Payoff matrix: The players knows all previous decisions. Goal: to develop an algorithm reaching the highest (payoff agains the opponents). Axelrod nominated the random strategy: s. T=(0. 5, 0. 5), and declaired that each player will play a game against herself in each round. The average payoff is determined by averaging between the rounds 200 and 300 meanwhile the repeated game was longer (N=600 rounds). By this way the organizer could avoid the undesired effects of the Shadow of future, that dictates to choose D in the last round, that reduces the number of iteration to (N-1) when D is the best choice in the steps (N-1), etc. Finally the rational players should choose D mutually, that preserves the dilemma. 4

Possible strategies for repeated games: All. D: choosing D always (unconditional defector, the bad

Possible strategies for repeated games: All. D: choosing D always (unconditional defector, the bad guy, …) All. C: choosing C always („the good guy” or sucker) Random: chooses D or C with probabilities q or (1 -q) TFT (Tit-for tat): chooses C first, then she repeates/reciprocates the previous strtategy of the co-player Suspicious TFT: first D, then reciprocates Generous TFT: TFT, but chooses C (instead of D) with a probability q WSLS (win-stay-lose-shift): first C or D, then she changes it if her payoff is smaller than an aspiration level (Ux<a) Stochastic reactive strategies: Chooses C or D with probabilities dependent on the previous decision of the co-player Stochastic reactive strategies with longer memomy: Etc. Home work 3. 1: Determine the series of strategies of a repeated PG game for the pairs: All. D-All. C, All. D-TFT, TFT-WSLS, 5

Axelrod ‘s tournament: 14 strategies are nominated The winner: TFT (the simplest/shortest algorithm) nominated

Axelrod ‘s tournament: 14 strategies are nominated The winner: TFT (the simplest/shortest algorithm) nominated by Anatol Rapoport. Second tournament: with 50 additional new strategies most of the new strategies would be a winner in the first tournament Despite it, the winner was the TFT Axelrod advices: Don’t be envious Don’t be the first to defect Recipricate both cooperation and defection Don’t be too clever The essence: PD game should be transformed into repeated PD game (with uncertainties about the end) and use TFT This strategy was followed by Henry Kissinger 6

Axelrod investigated systematically the competition between the strategies nominated Conclusions: In the knowledge of

Axelrod investigated systematically the competition between the strategies nominated Conclusions: In the knowledge of the opponent’ strategies we can develop a strategy that beats TFT, otherwise we should follow TFT - - TFT plays key role in the maintenance of cooperation if the players can adopt the strategy from a better player Hamilton suggested considering: after each round the worst player adopt the strategy of the winner this a simple realization of the Darwinian selection The winner of the evolutionary competition was the TFT In the absence of TFT, All. D was the winner 7

Shortages: - The strategy set was not complete. Consequently the validity of conclusions are

Shortages: - The strategy set was not complete. Consequently the validity of conclusions are questionable -TFT failed in noisy surrounding What happens if mistakes are allowed with a low probability between TFT players? The series of choices: error ↓ ↓ CCCCCCDCDCDDDDDCDCDCDCCCCCCCDCDCDDDDDCDCDCDCCCCC Due to the mistakes the average payoff is decreased significantly - If 2 R<T+S, then it better to choose C and D alternately (in opposite phase) 8

25 th anniversary of the Axelrod tournament The computer tournament was repeated The TFT

25 th anniversary of the Axelrod tournament The computer tournament was repeated The TFT was beaten by a strategy receiving support from a team. During an initial period the team members identified each other and later they choosed C against the „leader” who exploited them by choosing D against the team members. Anyway, the „leader” used TFT strategy versus all the other players. The average payoff of the team was lower than those received by TFT Real example: cycle race, . . . The advantage of a team can be compensated by the participation of other teams Home work 3. 2: Show that WSLS can win in a population consisting of WSLS, All. C and TFT players! What happens if All. D players are present, too? 9

Experiments Games provide situations to investigate animal and human behaviors First experiment by Merril

Experiments Games provide situations to investigate animal and human behaviors First experiment by Merril Flood and Mervin Dreshel (~1950 at Rand Corporation) details in the book by Poundstone: Prisoner’s Dilemma (Anchor Books, 1992) Rapoport: Prisoner’s dilemma (1965, 2009 Michigan Univ. Press) 1) Questions to new collegaues who bought car from a leaving one: How have you shared the profit of the second-hand car dealer? (later it is called „Ultimatum game”) 2) Performed a repeated two-person 2 x 2 game where the players should press one of the two buttoms their reactions are recorded (see the mentioned book) mutual cooperation with a frequency of 75 % the game was complicated, (asymmetries in payoffs) 3) The experimental studies were continued at Chicago University (under military control) 10

Animal experiments, Milinski (Nature, 1987) Observations: two prey fish should check (in pair) wether

Animal experiments, Milinski (Nature, 1987) Observations: two prey fish should check (in pair) wether the predator fish is hungry or not (if not, then they can continue eating calmly) in experiments a painted wood fish was substituted for the predator fish The co-player’ behavior was imitated by moving a mirror Conclusions: The prey fish apply a strategy resembling „Tit-for-tat” animals learn These experiments were repeated later (200 x) in a more realistic environment by using painted/decorated fish that were observed by camera and the results were analysed by image processing algorithms. Conclusions: fish recognize each other and form permanent partnerships for checking the predator 11

Examples for cooperation among animals Possibilities for collaboration: Feeding each others (bacteria, vampire bats,

Examples for cooperation among animals Possibilities for collaboration: Feeding each others (bacteria, vampire bats, etc. ) Musk oxes against wolfs Warming each other (sheep, etc. ) Ants form a bridge Guarding each other (fish school, herding or flocking, etc. ) Hunting (pack of wolfs or lions) To grow up offspring Monkeys Specialization, division of work (ant, bee, multicellular biological systems) Symbiotism among plants and animals 12

Ultimatum game (Güth et al. 1982) Two players should share 100 euro. The first

Ultimatum game (Güth et al. 1982) Two players should share 100 euro. The first (proposer) suggests a division the second (acceptor) accepts or not. If the second player accepts the suggestion the sum is divided as it is proposed, otherwise they receive nothing. Advice of game theory: to suggest 1 euro for the other, who should accept it as it is more than nothing. Experimantal results: - Large portion of people (~ 50 %) suggest fraternal division. - Average sum proposed: ~30 -40 % - Low (<10%) and high (>60 %) were rare. - If the proposal was less than 20 %, then the 80 % of the second players rejected (punished). This game was repeated wordwide (considering peoples with different customs, cultures, and levels of life) with the same results. Explanation: it is the result of a long evolutionary process and developed a long time 13

Dictator game („similar” to the Ultimatum game) The sum (100 euro) is divided according

Dictator game („similar” to the Ultimatum game) The sum (100 euro) is divided according to the proposal of the first (dictator) player. It is not a real game as the second player can not affect the Results: the average value of the proposal is about 20 %. sometimes the proposer suggested fraternal sharing. sometimes the proposer hold the whole sum. Trust game (Berg et al. 1995) The first player decides what portion of 10 $ will be invested for the other receiving the tripled sum, The second player decides what portion of her profit will be payed back to the first player. Results: the average investment is 5 $ (50 %). 5 of 32 players invested the whole sum while half of the second player rewarded nothing the average investment exceeded the average payback. large fluctuation in the human behavior Irodalom: Camerer, Behavioral Game Theory, (Princeton Univ. Press, 2003) 14

Experiments by Berg et al. [Games Econ. Behav. 10 (1995) 122] 32 pairs of

Experiments by Berg et al. [Games Econ. Behav. 10 (1995) 122] 32 pairs of players Ordered according to the investments 15

Public goods game: Students played public goods games. The players are divided into five-person

Public goods game: Students played public goods games. The players are divided into five-person groups after each round. After the 6 th round the possibility of punishment was announced. Accordingly, each cooperator can initiate a punishment (with a cost decreasing her pure income) while the defectors’ income is reduced by a fine (> cost). It is an altrustic punishment as the players were rearranged after each round. Time-dependence of the total investment. (investments) (without punishment) (with punishment) (rounds) 16

Fehr and Gachter (1998): investment from 0 to 20 in a public goods game

Fehr and Gachter (1998): investment from 0 to 20 in a public goods game with or without punishment Quantitative comparison of the frequency of a given investment: 17

Beeing watched Bateson et al. , [Biology Letters 2, (2006) 412– 414. ] Students

Beeing watched Bateson et al. , [Biology Letters 2, (2006) 412– 414. ] Students could drink milk in a college coffe room and paid their contributions into an honesty box decorated with different pictures. Their honesty was quantified by the average price of milk. Week by week the picture was changed alternately by showing a pair of watching eyes or flowers. Results: Conclusion: Our cooperative behavior can be affected by weak signs. 18

Brain research Using tomographical methods the neurobiological activity of the players’ brain is investigated

Brain research Using tomographical methods the neurobiological activity of the players’ brain is investigated when they are playing (ultimatum) games. First results: - brain reaction is fast (no thinking) and comes from the ancient region and sometimes similar to anger/fury - brain rewards the punisher by creating hormons yielding good feeling - brain reaction depends on the personal profile Takahashi et al. , PNAS 109 (2012 March) 4281. 19