Poker for Fun and Profit and intellectual challenge






























- Slides: 30
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta
Poker
World Series of Poker
Poker Research Group - core • • Darse Billings (Ph. D. ) Aaron Davidson M. Sc. , Poki Neil Burch P/A, Ps. Opti Terence Schauenberg (M. Sc. ), Adapti • Advisors: J Schaeffer, D Szafron
Poker Research Group – new arrivals • Bret Hoehn (M. Sc. ) • Finnegan Southey (postdoc) • • Michael Bowling Dale Schuurmans Rich Sutton Robert Holte
Our Goal
Ps. Opti 2 vs. “the. Count”
Play Us Online http: //games. cs. ualberta. ca/poker/
Poki’s Poker Academy http: //poki-poker. com
Poker Variants • Many different variants of poker • Texas Hold’em the most skill-testing • No-Limit Texas Hold’em used to determine the world champion • Our research: Limit Texas Hold’em • Current focus: 2 -player (heads up)
2 -player, limit, Texas Hold’em 1, 624, 350 O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 Turn 2 private cards to each player 3 community cards 1 community card Bet Sequence 44 River 19 Bet Sequence 1 community card
Research Issues 1. 2. 3. 4. 5. 6. Chance events Imperfect Information Sheer size of the game tree Opponent modelling is crucial How best to use domain knowledge ? Experimental method Variants have even more challenges: – – More than 2 players (up to 10) “No limit” (bid any amount)
Issues: Chance Events • Utility of outcomes – currently just reason about expected payoff – short-term vs. long-term • High variance – was the outcome due to luck or skill ? – experiment design
Issues: Imperfect Information • Probabilistic strategies are essential • Cannot construct your strategy in a bottom-up manner, as is done with perfect information games
Issues: Size of the game • 2 -player, Limit, Texas Hold’em game tree has about 1018 states • Linear Programming can solve games with 108 states
Issues: Opponent Modelling • Nash equilibrium not good enough – Static – Defensive • Even the best humans have weaknesses that should be exploited • How to learn very quickly, with very noisy information ? – Expoitation vs. exploration • How not to be exploited yourself ?
Issues: Using Expert Knowledge • We are fortunate to have unlimited access to a poker-playing expert (Darse) • How best to use his knowledge ? – Expert system (explicitly encoded knowledge) was not effective – Used his knowledge to devise abstractions that reduced the game size with minimal impact on strategic aspects of the game – Use him to evaluate the system
Experimental Method • High variance • ‘bot play not the same as human play • Very limited access to expert humans other than our own expert
Coping with very large games Full game tree T abstraction (lossy) Abstract game tree T* Solve (LP) too big to solve Strategy For T (reverse mapping) Strategy For T*
Abstraction • Texas Hold'em 2 -player game tree is too big for current LP –solvers (1, 179, 000, 604, 565, 715, 751) • Many ways of doing the abstractions – We require coarse-grained abstractions – Avoiding a severe loss of accuracy • Abstract to a set of smaller problems 108 states, 106 equations and unknowns
Alternate Game Structures • Truncation of betting rounds • Bypassing betting rounds • Models with 3 rounds, 2 rounds, or 1 round • Many-to-one mapping of game-tree nodes to single nodes in the abstract game tree – How you do the mapping determines the overall accuracy (few good and many bad mappings) – This is the limiting factor of the method
1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 3 -round Model Turn Bet Sequence 44 River 19 Bet Sequence (expected value leaf nodes)
1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 Turn Bet Sequence 44 River 19 Bet Sequence 1 -round Preflop Model 3 -round Postflop Model (single flop)
Abstractions • Board Q – 7 – 2 • Compare 1. A – 3 2. A – 4 3. A –K – Suit isomorphism ( 24 X) (exact) – Rank near-equivalence (small error) • Bucketing Hands are mapped to a small set of buckets depending on • Current hand strength • Potential for improvement in hand strength
Bucketing • Reduce branching factor at chance nodes • Partition hands into six classes per player • Overlaying strategically similar sub-trees Original Bucketing 1, 1 1, 2 1, 3 …. 6, 6 1, 1 1, 2 1, 3 . … 6, 6 Transition Probabilities Next Round Bucketing
Initial w 2 (36) 9 of 19 Bet Sequence 7 of 15 17, 296 Flop x 2 (36) 9 of 19 Bet Sequence 7 of 15 Turn y 2 (36) Bet Sequence 7 of 15 44 River z 2 (36) 19 Bet Sequence 1, 624, 350 Texas Hold'em O(1018) 45 9 of 19 15 Abstract Preflop Model O(107) Abstract Postflop Model O(107)
Reverse Mapping • Bucket splitting – LP solution gives a strategy (recipe) – Each partition class split strong / weak – Split the randomized mixed strategy – {0, 0. 2, 0. 8} => {0, 0, 1. 0} & {0, 0. 4, 0. 6} • Better hand selection (with some risk)
Putting It All Together – Ps. Opti 1 Preflop Selby preflop model Flop Turn 2 4 6 8 River Post Bets
Putting It All Together – Ps. Opti 2 Preflop 3 -round preflop model Flop Turn River 2 4 4 6 6 8 8 Post Post Bets + model
Conclusions • Game Theory can be applied to large problems and practical systems • Nash Equilibrium (minimax) too defensive, does not exploit the opponent’s weaknesses • Current work involves opponent modelling – Preliminary results are very promising • We hope to beat the best poker players in the world in the near future