Poker for Fun and Profit and intellectual challenge

  • Slides: 30
Download presentation
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University

Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta

Poker

Poker

World Series of Poker

World Series of Poker

Poker Research Group - core • • Darse Billings (Ph. D. ) Aaron Davidson

Poker Research Group - core • • Darse Billings (Ph. D. ) Aaron Davidson M. Sc. , Poki Neil Burch P/A, Ps. Opti Terence Schauenberg (M. Sc. ), Adapti • Advisors: J Schaeffer, D Szafron

Poker Research Group – new arrivals • Bret Hoehn (M. Sc. ) • Finnegan

Poker Research Group – new arrivals • Bret Hoehn (M. Sc. ) • Finnegan Southey (postdoc) • • Michael Bowling Dale Schuurmans Rich Sutton Robert Holte

Our Goal

Our Goal

Ps. Opti 2 vs. “the. Count”

Ps. Opti 2 vs. “the. Count”

Play Us Online http: //games. cs. ualberta. ca/poker/

Play Us Online http: //games. cs. ualberta. ca/poker/

Poki’s Poker Academy http: //poki-poker. com

Poki’s Poker Academy http: //poki-poker. com

Poker Variants • Many different variants of poker • Texas Hold’em the most skill-testing

Poker Variants • Many different variants of poker • Texas Hold’em the most skill-testing • No-Limit Texas Hold’em used to determine the world champion • Our research: Limit Texas Hold’em • Current focus: 2 -player (heads up)

2 -player, limit, Texas Hold’em 1, 624, 350 O(1018) Initial 9 of 19 Bet

2 -player, limit, Texas Hold’em 1, 624, 350 O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 Turn 2 private cards to each player 3 community cards 1 community card Bet Sequence 44 River 19 Bet Sequence 1 community card

Research Issues 1. 2. 3. 4. 5. 6. Chance events Imperfect Information Sheer size

Research Issues 1. 2. 3. 4. 5. 6. Chance events Imperfect Information Sheer size of the game tree Opponent modelling is crucial How best to use domain knowledge ? Experimental method Variants have even more challenges: – – More than 2 players (up to 10) “No limit” (bid any amount)

Issues: Chance Events • Utility of outcomes – currently just reason about expected payoff

Issues: Chance Events • Utility of outcomes – currently just reason about expected payoff – short-term vs. long-term • High variance – was the outcome due to luck or skill ? – experiment design

Issues: Imperfect Information • Probabilistic strategies are essential • Cannot construct your strategy in

Issues: Imperfect Information • Probabilistic strategies are essential • Cannot construct your strategy in a bottom-up manner, as is done with perfect information games

Issues: Size of the game • 2 -player, Limit, Texas Hold’em game tree has

Issues: Size of the game • 2 -player, Limit, Texas Hold’em game tree has about 1018 states • Linear Programming can solve games with 108 states

Issues: Opponent Modelling • Nash equilibrium not good enough – Static – Defensive •

Issues: Opponent Modelling • Nash equilibrium not good enough – Static – Defensive • Even the best humans have weaknesses that should be exploited • How to learn very quickly, with very noisy information ? – Expoitation vs. exploration • How not to be exploited yourself ?

Issues: Using Expert Knowledge • We are fortunate to have unlimited access to a

Issues: Using Expert Knowledge • We are fortunate to have unlimited access to a poker-playing expert (Darse) • How best to use his knowledge ? – Expert system (explicitly encoded knowledge) was not effective – Used his knowledge to devise abstractions that reduced the game size with minimal impact on strategic aspects of the game – Use him to evaluate the system

Experimental Method • High variance • ‘bot play not the same as human play

Experimental Method • High variance • ‘bot play not the same as human play • Very limited access to expert humans other than our own expert

Coping with very large games Full game tree T abstraction (lossy) Abstract game tree

Coping with very large games Full game tree T abstraction (lossy) Abstract game tree T* Solve (LP) too big to solve Strategy For T (reverse mapping) Strategy For T*

Abstraction • Texas Hold'em 2 -player game tree is too big for current LP

Abstraction • Texas Hold'em 2 -player game tree is too big for current LP –solvers (1, 179, 000, 604, 565, 715, 751) • Many ways of doing the abstractions – We require coarse-grained abstractions – Avoiding a severe loss of accuracy • Abstract to a set of smaller problems 108 states, 106 equations and unknowns

Alternate Game Structures • Truncation of betting rounds • Bypassing betting rounds • Models

Alternate Game Structures • Truncation of betting rounds • Bypassing betting rounds • Models with 3 rounds, 2 rounds, or 1 round • Many-to-one mapping of game-tree nodes to single nodes in the abstract game tree – How you do the mapping determines the overall accuracy (few good and many bad mappings) – This is the limiting factor of the method

1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296

1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 3 -round Model Turn Bet Sequence 44 River 19 Bet Sequence (expected value leaf nodes)

1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296

1, 624, 350 Texas Hold'em O(1018) Initial 9 of 19 Bet Sequence 17, 296 Flop 9 of 19 Bet Sequence 45 9 of 19 Turn Bet Sequence 44 River 19 Bet Sequence 1 -round Preflop Model 3 -round Postflop Model (single flop)

Abstractions • Board Q – 7 – 2 • Compare 1. A – 3

Abstractions • Board Q – 7 – 2 • Compare 1. A – 3 2. A – 4 3. A –K – Suit isomorphism ( 24 X) (exact) – Rank near-equivalence (small error) • Bucketing Hands are mapped to a small set of buckets depending on • Current hand strength • Potential for improvement in hand strength

Bucketing • Reduce branching factor at chance nodes • Partition hands into six classes

Bucketing • Reduce branching factor at chance nodes • Partition hands into six classes per player • Overlaying strategically similar sub-trees Original Bucketing 1, 1 1, 2 1, 3 …. 6, 6 1, 1 1, 2 1, 3 . … 6, 6 Transition Probabilities Next Round Bucketing

Initial w 2 (36) 9 of 19 Bet Sequence 7 of 15 17, 296

Initial w 2 (36) 9 of 19 Bet Sequence 7 of 15 17, 296 Flop x 2 (36) 9 of 19 Bet Sequence 7 of 15 Turn y 2 (36) Bet Sequence 7 of 15 44 River z 2 (36) 19 Bet Sequence 1, 624, 350 Texas Hold'em O(1018) 45 9 of 19 15 Abstract Preflop Model O(107) Abstract Postflop Model O(107)

Reverse Mapping • Bucket splitting – LP solution gives a strategy (recipe) – Each

Reverse Mapping • Bucket splitting – LP solution gives a strategy (recipe) – Each partition class split strong / weak – Split the randomized mixed strategy – {0, 0. 2, 0. 8} => {0, 0, 1. 0} & {0, 0. 4, 0. 6} • Better hand selection (with some risk)

Putting It All Together – Ps. Opti 1 Preflop Selby preflop model Flop Turn

Putting It All Together – Ps. Opti 1 Preflop Selby preflop model Flop Turn 2 4 6 8 River Post Bets

Putting It All Together – Ps. Opti 2 Preflop 3 -round preflop model Flop

Putting It All Together – Ps. Opti 2 Preflop 3 -round preflop model Flop Turn River 2 4 4 6 6 8 8 Post Post Bets + model

Conclusions • Game Theory can be applied to large problems and practical systems •

Conclusions • Game Theory can be applied to large problems and practical systems • Nash Equilibrium (minimax) too defensive, does not exploit the opponent’s weaknesses • Current work involves opponent modelling – Preliminary results are very promising • We hope to beat the best poker players in the world in the near future