Nesterovs excessive gap technique and poker Andrew Gilpin

Outline • • • Two-person zero-sum sequential games First-order methods for convex optimization Nesterov’s

We want to solve: If Q 1 and Q 2 are simplices, complexes, this

What’s a complex? It’s just like a simplex, but more complex. Each player’s complex

Recall our problem: where Q 1 and Q 2 are complexes Since Q 1

(Un)scalability of LP solvers • Rhode Island poker [Shi & Littman 01] – LP

Convex optimization Suppose we want to solve where f is convex. Note that this

Strong convexity A function if there exists for all is strongly convex such that

Recall our problem: where Q 1 and Q 2 are complexes Equivalently: where and

, , Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure

Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive

Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at

Nice prox functions A prox function d for Q is nice if it is:

Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n)

From the simplex to the complex Theorem [Hoda, G. , Peña 06] A nice

Prox function example Let be any nice simplex prox function. The prox function for

Heuristics [G. , Hoda, Peña, Sandholm 07] • Heuristic 1: Aggressive μ reduction –

Matrix-vector multiplication in poker [G. , Hoda, Peña, Sandholm 07] • The main time

Memory usage comparison Instance CPLEX IPM CPLEX Simplex EGT 10 k 0. 082 GB

Poker • Poker is a recognized challenge problem in AI because (among other reasons)

Potential-aware automated abstraction [G. , Sandholm, Sørensen 07] • Most prior automated abstraction algorithms

Solving the four-round model • Computed abstraction with – 20 first-round buckets – 800

Future research • Customizing second-order (e. g. interiorpoint methods) for the equilibrium problem •

Slides: 31

Download presentation

Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm

Outline • • • Two-person zero-sum sequential games First-order methods for convex optimization Nesterov’s excessive gap technique (EGT) EGT for sequential games Heuristics for EGT Application to Texas Hold’em poker

We want to solve: If Q 1 and Q 2 are simplices, complexes, this is the Nash equilibrium problem for two-person zero-sum matrix games sequential games

What’s a complex? It’s just like a simplex, but more complex. Each player’s complex encodes her set of realization plans in the game In particular, player 1’s complex is where E and e depend on the game…

A B C D E F G H

Recall our problem: where Q 1 and Q 2 are complexes Since Q 1 and Q 2 have a linear description, this problem can be solved as an LP. However, current LP solution methods do not scale

(Un)scalability of LP solvers • Rhode Island poker [Shi & Littman 01] – LP has 91 million rows and columns – Applying Game. Shrink automated abstraction algorithm yields an LP with only 1. 2 million rows and columns, and 50 million nonzeros [G. & Sandholm, 06 a] – Solution requires 25 GB RAM and over a week of CPU time • Texas Hold’em poker – ~1018 nodes in game tree – Lossy abstractions need to be performed – Limitations of current solver technology primary limitation to achieving expert-level strategies [G. & Sandholm 06 b, 07 a] • Instead of standard LP solvers, what about a first-order method?

Convex optimization Suppose we want to solve where f is convex. Note that this formulation captures ALL convex optimization problems (can model feasible space using an indicator function) For general f, convergence Analysis based requires on black-box O(1/ε 2) iterations oracle (e. g. , for subgradient accessmethods) model. Can we do better by looking inside the box? For smooth, strongly convex f with Lipschitzcontinuous gradient, can be done in O(1/ε½) iterations

Strong convexity A function if there exists for all is strongly convex such that and all is the strong convexity parameter of d

Recall our problem: where Q 1 and Q 2 are complexes Equivalently: where and

, , Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure Let d 1, d 2 be smooth and strongly convex on Q 1, Q 2 These are called prox-functions Now let μ > 0 and consider: These are well-defined smooth functions

Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive gap condition requires that fμ(y) ≤ Φμ(x) (EGC) The algorithm maintains (EGC), and gradually decreases μ As μ decreases, the smoothed functions approach the non-smooth functions, and thus iterates satisfying (EGC) converge to optimal solutions

Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at most N iterations, the iterates have duality gap at most Furthermore, each iteration only requires solving three problems of the form and performing three matrix-vector product operations on A.

Nice prox functions A prox function d for Q is nice if it is: 1. Strongly convex continuous everywhere in Q, and differentiable in the relative interior of Q 2. The min of d over Q is 0 3. The following maps are easily computable:

Nice simplex prox function 1: Entropy

Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n) time

From the simplex to the complex Theorem [Hoda, G. , Peña 06] A nice prox function can be constructed for the complex via a recursive application of any nice prox function for the simplex

Prox function example Let be any nice simplex prox function. The prox function for this matrix is:

Solving

(similar to b(i-vii))

Heuristics [G. , Hoda, Peña, Sandholm 07] • Heuristic 1: Aggressive μ reduction – The μ given in the previous algorithm is a conservative choice guaranteeing convergence – In practice, we can do much better by aggressively pushing μ, while checking that the excessive gap condition is satisfied • Heuristic 2: Balanced μ reduction – To prevent one μ from dominating the other, we also perform periodic adjustments to keep them within a small factor of one another

Matrix-vector multiplication in poker [G. , Hoda, Peña, Sandholm 07] • The main time and space bottleneck of the algorithm is the matrix-vector product on A • Instead of storing the entire matrix, we can represent it as a composition of Kronecker products • We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup

Memory usage comparison Instance CPLEX IPM CPLEX Simplex EGT 10 k 0. 082 GB >0. 051 GB 0. 012 GB 160 k 2. 25 GB >0. 664 GB 0. 035 GB RI 25. 2 GB >3. 45 GB 0. 15 GB Texas >458 GB 2. 49 GB

Poker • Poker is a recognized challenge problem in AI because (among other reasons) – the other players’ cards are hidden; – bluffing and other deceptive strategies are needed in a good player; – there is uncertainty about future events. • Texas Hold’em: most popular variant of poker • Two-player game tree has ~1018 nodes

Potential-aware automated abstraction [G. , Sandholm, Sørensen 07] • Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric – This ignores hands like flush draws where although the probability of winning is small, the payoff could be high • Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game • This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential

Solving the four-round model • Computed abstraction with – 20 first-round buckets – 800 second-round buckets – 4800 third-round buckets – 28800 fourth-round buckets • Algorithm using 30 GB RAM – Simply representing as an LP requires 32 TB – Outputs new, improved solution every 2. 5 days

[G. , Sandholm, Sørensen 07]

Future research • Customizing second-order (e. g. interiorpoint methods) for the equilibrium problem • Additional heuristics for improving practical performance of EGT algorithm • Techniques for finding an optimal solution from an ε-solution

Thank you ☺