Building Agents for the Lemonade Game Using a
Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model Michael Wunder Michael Kaisers Michael Littman John Yaros
Overview of Method �In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out �As a result, complicated predictionoptimization learners are at a disadvantage �Utilizing heuristics, an agent can identify (and attract) potential partners �Population-based models are useful to
Example: p-beauty contest �Keynes proposed that the stock market is like a beauty contest where judges are trying to guess the contestant (or stock, or strategy) that others like �n players submit a number x between 0 and 100, and the winner is closest to a fraction of the average guess, p*(∑i xi)/n, �p is fraction between 0 and 1, i. e.
P-beauty game explained �The Nash strategy is to play 0 because it cannot be outplayed �However, first-time players do not reach this outcome…why? from Behavioral Game Theory by Colin Camerer
How a Cognitive Hierarchy Works Level k: reacts to Level k-1 … Level 1: reacts Good ol’ Rock. Good ol’can’t Rock. Homer to Nothing beats beat that. the base strategy Level 0: no reasoning, at Level 0 only random action or simple rule Poor predictable Bart, always picks Rock.
Population-based Reasoning �Steps of the CH technique: Identify base strategies (random, static) 2. Derive processes for steps of reasoning 1. ◦ A step of reasoning, in this case, is the strategy that can exploit the one before Recursively apply steps to each level k 4. These levels form the “hierarchy” according to some distribution f(k) 5. Select a strategy that does well against desired population 3.
Lemonade-Stand Game Levels �LG yields elegant level heuristics �L 0 -U: Uniformly random action �L 0 -C: Constant action �L 0 -X: Constant with probability X, otherwise choose randomly �L 1: Move Across from most stable player (with highest X). Also Optimal against L 1. This move is Cooperative equilibrium.
Lemonade Game Levels, Cont’d. �L 2: Stay Constant for at least one turn, in case opponents are two L 1 s. If the current location is disadvantageous, move somewhere else, perhaps Across from a good partner. �L 3: With other L 3, “Sandwich” a constant or L 2 player, and become Across from each other if it moves. �Can we classify contestants by level?
Actual Competition Results �Using idealized agents from each of these levels, find the score of each contestant against populations of adjacent levels
Actual Competition Results �The x-axis is composed of a ratio of the nearby levels—Level 1. 2 is a population of 80% L 1 and 20% Level 2
Actual Competition Results �This population construction method allows for clear distinctions between levels, but other possibilities exist
Mock Competition of Levels
Conclusion �Our agent (RL 3) contains elements of all three levels, which is not optimal against this population of competitors �The model that emerges from LG does predict the outcome fairly well �The model predicts that subsequent repetitions would generally move the population “up” the hierarchy �CH has implications for larger games (e. g. TAC)
- Slides: 13