Nesterovs excessive gap technique and poker Andrew Gilpin

  • Slides: 31
Download presentation
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007

Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm

Outline • • • Two-person zero-sum sequential games First-order methods for convex optimization Nesterov’s

Outline • • • Two-person zero-sum sequential games First-order methods for convex optimization Nesterov’s excessive gap technique (EGT) EGT for sequential games Heuristics for EGT Application to Texas Hold’em poker

We want to solve: If Q 1 and Q 2 are simplices, complexes, this

We want to solve: If Q 1 and Q 2 are simplices, complexes, this is the Nash equilibrium problem for two-person zero-sum matrix games sequential games

What’s a complex? It’s just like a simplex, but more complex. Each player’s complex

What’s a complex? It’s just like a simplex, but more complex. Each player’s complex encodes her set of realization plans in the game In particular, player 1’s complex is where E and e depend on the game…

A B C D E F G H

A B C D E F G H

Recall our problem: where Q 1 and Q 2 are complexes Since Q 1

Recall our problem: where Q 1 and Q 2 are complexes Since Q 1 and Q 2 have a linear description, this problem can be solved as an LP. However, current LP solution methods do not scale

(Un)scalability of LP solvers • Rhode Island poker [Shi & Littman 01] – LP

(Un)scalability of LP solvers • Rhode Island poker [Shi & Littman 01] – LP has 91 million rows and columns – Applying Game. Shrink automated abstraction algorithm yields an LP with only 1. 2 million rows and columns, and 50 million nonzeros [G. & Sandholm, 06 a] – Solution requires 25 GB RAM and over a week of CPU time • Texas Hold’em poker – ~1018 nodes in game tree – Lossy abstractions need to be performed – Limitations of current solver technology primary limitation to achieving expert-level strategies [G. & Sandholm 06 b, 07 a] • Instead of standard LP solvers, what about a first-order method?

Convex optimization Suppose we want to solve where f is convex. Note that this

Convex optimization Suppose we want to solve where f is convex. Note that this formulation captures ALL convex optimization problems (can model feasible space using an indicator function) For general f, convergence Analysis based requires on black-box O(1/ε 2) iterations oracle (e. g. , for subgradient accessmethods) model. Can we do better by looking inside the box? For smooth, strongly convex f with Lipschitzcontinuous gradient, can be done in O(1/ε½) iterations

Strong convexity A function if there exists for all is strongly convex such that

Strong convexity A function if there exists for all is strongly convex such that and all is the strong convexity parameter of d

Recall our problem: where Q 1 and Q 2 are complexes Equivalently: where and

Recall our problem: where Q 1 and Q 2 are complexes Equivalently: where and

, , Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure

, , Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure Let d 1, d 2 be smooth and strongly convex on Q 1, Q 2 These are called prox-functions Now let μ > 0 and consider: These are well-defined smooth functions

Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive

Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive gap condition requires that fμ(y) ≤ Φμ(x) (EGC) The algorithm maintains (EGC), and gradually decreases μ As μ decreases, the smoothed functions approach the non-smooth functions, and thus iterates satisfying (EGC) converge to optimal solutions

Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at

Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at most N iterations, the iterates have duality gap at most Furthermore, each iteration only requires solving three problems of the form and performing three matrix-vector product operations on A.

Nice prox functions A prox function d for Q is nice if it is:

Nice prox functions A prox function d for Q is nice if it is: 1. Strongly convex continuous everywhere in Q, and differentiable in the relative interior of Q 2. The min of d over Q is 0 3. The following maps are easily computable:

Nice simplex prox function 1: Entropy

Nice simplex prox function 1: Entropy

Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n)

Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n) time

From the simplex to the complex Theorem [Hoda, G. , Peña 06] A nice

From the simplex to the complex Theorem [Hoda, G. , Peña 06] A nice prox function can be constructed for the complex via a recursive application of any nice prox function for the simplex

Prox function example Let be any nice simplex prox function. The prox function for

Prox function example Let be any nice simplex prox function. The prox function for this matrix is:

Solving

Solving

(similar to b(i-vii))

(similar to b(i-vii))

Heuristics [G. , Hoda, Peña, Sandholm 07] • Heuristic 1: Aggressive μ reduction –

Heuristics [G. , Hoda, Peña, Sandholm 07] • Heuristic 1: Aggressive μ reduction – The μ given in the previous algorithm is a conservative choice guaranteeing convergence – In practice, we can do much better by aggressively pushing μ, while checking that the excessive gap condition is satisfied • Heuristic 2: Balanced μ reduction – To prevent one μ from dominating the other, we also perform periodic adjustments to keep them within a small factor of one another

Matrix-vector multiplication in poker [G. , Hoda, Peña, Sandholm 07] • The main time

Matrix-vector multiplication in poker [G. , Hoda, Peña, Sandholm 07] • The main time and space bottleneck of the algorithm is the matrix-vector product on A • Instead of storing the entire matrix, we can represent it as a composition of Kronecker products • We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup

Memory usage comparison Instance CPLEX IPM CPLEX Simplex EGT 10 k 0. 082 GB

Memory usage comparison Instance CPLEX IPM CPLEX Simplex EGT 10 k 0. 082 GB >0. 051 GB 0. 012 GB 160 k 2. 25 GB >0. 664 GB 0. 035 GB RI 25. 2 GB >3. 45 GB 0. 15 GB Texas >458 GB 2. 49 GB

Poker • Poker is a recognized challenge problem in AI because (among other reasons)

Poker • Poker is a recognized challenge problem in AI because (among other reasons) – the other players’ cards are hidden; – bluffing and other deceptive strategies are needed in a good player; – there is uncertainty about future events. • Texas Hold’em: most popular variant of poker • Two-player game tree has ~1018 nodes

Potential-aware automated abstraction [G. , Sandholm, Sørensen 07] • Most prior automated abstraction algorithms

Potential-aware automated abstraction [G. , Sandholm, Sørensen 07] • Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric – This ignores hands like flush draws where although the probability of winning is small, the payoff could be high • Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game • This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential

Solving the four-round model • Computed abstraction with – 20 first-round buckets – 800

Solving the four-round model • Computed abstraction with – 20 first-round buckets – 800 second-round buckets – 4800 third-round buckets – 28800 fourth-round buckets • Algorithm using 30 GB RAM – Simply representing as an LP requires 32 TB – Outputs new, improved solution every 2. 5 days

[G. , Sandholm, Sørensen 07]

[G. , Sandholm, Sørensen 07]

[G. , Sandholm, Sørensen 07]

[G. , Sandholm, Sørensen 07]

[G. , Sandholm, Sørensen 07]

[G. , Sandholm, Sørensen 07]

Future research • Customizing second-order (e. g. interiorpoint methods) for the equilibrium problem •

Future research • Customizing second-order (e. g. interiorpoint methods) for the equilibrium problem • Additional heuristics for improving practical performance of EGT algorithm • Techniques for finding an optimal solution from an ε-solution

Thank you ☺

Thank you ☺