Connections between Learning Theory Game Theory and Optimization

Connections between Learning Theory, Game Theory, and Optimization Lecture 14, October 7 th 2010 Maria Florina (Nina) Balcan

Improved Equilibria via Public Service Advertising

Good equilibria, Bad equilibria Many games have both bad and good equilibria. • In some places, everyone drives their own car. In some, everybody uses and pays for good public transit.

Fair cost-sharing: n players in weighted directed graph G. Player i wants to get from si to ti, and they share cost of edges they use with others. G

Fair cost-sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti. • All players share cost of edges they use with others. • Each player wants to minimize his own cost. • s n Good equilibrium: all use edge of cost 1. 1 (paying 1/n each) Bad equilibrium: all use edge of cost n. t (paying 1 each)

Inefficiency of equilibria, Po. A and Po. S Price of Anarchy (Po. A): ratio of worst Nash equilibrium to OPT. [Koutsoupias-Papadimitriou’ 99] Price of Stability (Po. S): ratio of best Nash equilibrium to OPT. [Anshelevich et. al, 2004] E. g. , for fair cost-sharing, Po. S is log(n), whereas Po. A is n. Significant effort spent on understanding these in CS. “Algorithmic Game Theory”, Nisan, Roughgarden, Tardos, Vazirani

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. Po. A is n; Po. S is log(n). Po. A is O(n): in any Nash no player pays more than OPT s Po. A is (n): n s 1 t n 1 t

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. Po. A is n; Po. S is log(n). Po. S is (log(n)): 0 0 s 1 1 0 0 … 1/2 t 0 sn 1/n-1 1+² 1/n

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. Po. A is n; Po. S is log(n). Po. S is O(log(n)): potential function argument

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where A player moves, change in player’s cost = change in potential Proof: player i moves, get from S to S’; let A be the edges in S but not in S’, and B the edges in S’ but not in S. Its change in cost: Change in the potential

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. is Social cost of where

Fair Cost Sharing • n players in directed graph G, each edge e costs ce. • Player i wants to get from si to ti, and minimize its cost. • all players share cost of edges they use with others. Po. S is O(log(n)): potential function argument • Iterate best-response dynamics starting from an optimal solution [i. e, while there is a player that can improve, pick an arbitrary such player and let him to best response]. • • Potential always decreases, finite # of states, so reach a pure Nash. The potential does not increase & reach a pure Nash of cost · H(n) ¢ OPT.

Congestion games more generally Game defined by n players and m resources. • Cost of a resource j is a function fj(nj) of the number nj of players using it. • Each player i chooses a set of resources (e. g. , a path) from collection Si of allowable sets of resources (e. g. , paths from si to ti). • Cost incurred by player i is the sum, over all resources being used, of the cost of the resource. • Generic potential function: Best-response dynamics always gives an equilibrium.

Congestion games more generally • Nice general class of games with many players. • Always have a pure-strategy equilibrium. • Have a potential function s. t. whenever a player switches, potential drops by exactly that player’s improvement. – Best-response dynamics always gives an equilibrium. • But maybe a large gap between the quality of the best and the worst equilibrium. • Lots of work on understanding properties of these games and quality of their equilibria.

Good equilibria, Bad equilibria Many games have both bad and good equilibria. • In some places, everyone drives their own car. In some, everybody uses and pays for good public transit.

Guiding from Bad to Good Can a helpful authority encourage (guide) behavior to move from a bad state to a good state? Standard motivation for Po. S: If a central authority could suggest a low-cost Nash (ride public transit), and everyone followed the suggestion, then this would be stable. Price of Anarchy (Po. A): ratio of worst Nash equilibrium to OPT. Price of Stability (Po. S): ratio of best Nash equilibrium to OPT.

$Guiding from Bad to Good What if only some fraction will pay attention? Rid$

Guiding from Bad to Good What if only some fraction will pay attention? Rid e p ubli c tr ans it [Balcan-Blum-Mansour, SODA 2009] Fundamental Questions • Can the authority guide behavior to a good state? • Will it just snap back? How does this depend on ?

Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. 0 0 s 1 1 Rid e p ub tra lic nsit 0 … 1 sn k

Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. Each player i follows with probability . Call players that follow receptive players 0 0 s 1 1 Rid e p ub tra lic nsit 0 … 1 sn k

Main Model 0. n players initially playing some arbitrary equilibrium. 1. Authority launches advertising, proposing joint action sad. Each player i follows with probability . Call players that follow receptive players 2. Remaining (non-receptive) players fall to some arbitrary equilibrium for themselves, given play of receptive players. 3. All players follow best-response dynamics to an overall Nash equilibrium. Notes: potential games, pure Nash eqs. social cost:

Main Results Cost sharing: (Po. S = log(n), Po. A = n) • If only a constant fraction of the players follow the advice, then we can still get within O(1/ ) of the Po. S. • Extend to cost-sharing + linear delays. Party Affiliation: (Po. S = 1, Po. A = (n 2)) • Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)).

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Note: this is best you can hope for. E. g. , k =2 n. 0 0 s 1 1 t 0 … 1 sn k

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Advertiser proposes OPT (any apx also works) Phase 1: random vars

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path Pi. OPT in OPT. - Moreover, this option is guaranteed to be at least as good as if other NR players didn’t exist.

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path Pi. OPT in OPT.

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Cost of non-receptive players at the end of Phase 2 - In any NE a non-receptive player i, can’t improve by switching to his path Pi. OPT in OPT. - Calculate total cost of these guaranteed options. Rearrange sum. . .

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Cost of non-receptive players at the end of Phase 2 Cost of receptive players at the end of Phase 2

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Cost of non-receptive players at the end of Phase 2 Cost of receptive players at the end of Phase 2 Use: X ~ Bi(n, p)

Fair Cost Sharing Cost sharing: (Po. S = log(n), Po. A = n) If only a constant fraction of the players follow the advice, then we get within O(1/ ) of the Po. S. Proof Idea: Expected total cost at the end of Phase 2: O(OPT/ ). In Phase 3, potential argument shows behavior cannot get worse by more than an additional log(n) factor.

Cost Sharing, Extension Cost sharing: + linear delays: - Still get same guarantee, but proof is trickier Problem: can’t argue as if remaining NR players didn’t exist since they add to delays

Cost Sharing, Extension Cost sharing: + linear delays: - Still get same guarantee, but proof is trickier Proof Idea: - Shadow game wrt non-receptieve players: pure linear latency fns. Offset defined by equilib. at end of phase 2. - This game has good Po. A (5/2). # users on e at end of phase 2

Cost Sharing, Extension Cost sharing: + linear delays: - Still get same guarantee, but proof is trickier Proof Idea: - Shadow game: pure linear latency fns - Behavior of NR at end of phase 2 is equilib for this game too. - Show Cost of the of nonreceptive players at the end of step 2: O(OPT/ ).

Cost Sharing, Extension Cost sharing: + linear delays: - Still get same guarantee, but proof is trickier Proof Idea: Cost of the of nonreceptive players at the end of step 2: O(OPT/ ). Need to still argue about the cost of the receptive players. Edge by edge charging: - more receptive players, loose a factor of two compared to OPT - more non-receptive players, already paid for, loose a factor of two

Party affiliation games • Given graph G, each edge labeled + or -. • Vertices have two actions: RED or BLUE. + + + Pay 1 for each + edge with endpoints of different color, and each – edge with endpoints of same color. • Special cases: • All + edges is consensus game. • All – edges is cut-game. -

Party affiliation games OPT is an equilibrium so Po. S = 1. But even for consensus games, Po. A = (n 2) Clique with perfect matching removed all edges labeled plus

Party affiliation games Party Affiliation: (Po. S = 1, Po. A = (n 2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)). Proof Idea: (lower bound) - Same example as for consensus Po. A, but sparser across cut. Degree ° n/8 across cut, °=1/2 -®

Party affiliation games Party Affiliation: (Po. S = 1, Po. A = (n 2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)). Proof : (lower bound) - Same example as for consensus Po. A, but sparser across cut. Degree ° n/8 across cut, °=1/2 -® • For large n, whp all nodes have at most a 1/2 -°/2 fraction on neighbs in R • Initially, each node has a °/4 fraction on nodes of the other color. • So, players “locked” into place

Party affiliation games Party Affiliation: (Po. S = 1, Po. A = (n 2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)). Proof: (upper bound, consensus games) - Advertising strategy = follow OPT, e. g. all red. - By Hoeffding, all nodes with degree log n/(®-1/2)2 have more than half of their neighbors in the set R, with prob. 1 -1/n. - At the end of step two, all nodes are red. Note: for general cut games, OPT might not have zero cost for each player.

Party affiliation games Party Affiliation: (Po. S = 1, Po. A = (n 2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)). Proof : (upper bound, general party affiliation games) - Advertising strategy = follow OPT. - Split nodes into those incurring low-cost vs those incurring high -cost under OPT. - Show that low-cost will switch to behavior in OPT. For high -cost, don’t care. - Cost only improves in final best-response process.

Party affiliation games Party Affiliation: (Po. S = 1, Po. A = (n 2)) - Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n 2). (assume degrees (log n)). Proof : (upper bound, general party affiliation games) • S is a ¯-dominating if every vertex not in S has more than a ½+¯ fraction of neighbs in S. • If ® > ½+2¯, then set R of receptive players is ¯-dominating whp • Split nodes into those incurring low-cost (less than a ¯-fraction of incident edges incur a cost in OPT) vs those incurring highcost under OPT. • Low-cost will switch to behavior in OPT. For high-cost, can only incur a cost of only 1/¯ more their cost in OPT.

Summary Analyze ability of a central authority to guide behavior to a good equilibrium even if only ® fraction of players are paying attention.

Influencing Dynamics A more adaptive model [Balcan Blum Mansour, ICS 2010] Each player has a few abstract actions. Uses a learning, experts based alg. to decide which one to use Expert 1 Expert 2 Play Best Response Play the Advertised Behavior [no rigid separation between receptive vs non-receptive players]

Open Questions Get around problem of natural dynamics converging to poor equilibrium without central authority by giving players more information about the game?