Game Playing Chapter 6 Game Playing and AI

Game Playing and AI FWhy would game playing be a good problem for AI

Game Playing and AI Deterministic perfect info (fully observable) imperfect info (partially observable) 2/25/2021

Game Playing as Search n Consider two-player, turn-taking, board games q q q n

Game Playing as Search: Game Tree FWhat's the new aspect to the search problem?

Game Playing as Search: Complexity n Assume the opponent’s moves can be predicted given

Greedy Search Game Playing Þ A utility function maps each terminal state of the

Greedy Search Game Playing n n n Expand each branch to the terminal states

Greedy Search Game Playing FAssuming a reasonable search space, what's the problem with greedy

Minimax: Idea n Assuming the worst (i. e. opponent plays optimally): given there are

Minimax: Idea n n The computer assumes after it moves the opponent will choose

Minimax: Passing Values up Game Tree n n n Explore the game tree to

Deeper Game Trees Minimax can be generalize for > 2 moves Values backed up

Minimax: Direct Algorithm For each move by the computer: 1. Perform depth-first search to

Minimax: Algorithm Complexity Assume all terminal states are at depth d FSpace complexity? depth-first

Minimax: Algorithm Complexity n Direct minimax algo. is impractical in practice q instead do

Minimax: Static Board Evaluator (SBE) Þ A static board evaluation function estimates how good

Minimax: Static Board Evaluator (SBE) n n n Typically, one subtracts how good board

Minimax: Algorithm with SBE int minimax (Node s, int depth, int limit) { if

Minimax: Algorithm with SBE n The same as direct minimax, except q q n

Recap n Can't minimax search to the end of the game. q n SBE

Alpha-Beta Pruning Idea n n n Some of the branches of the game tree

Alpha-Beta Pruning Idea n Beta cutoff pruning occurs when maximizing if child’s alpha >=

Alpha-Beta Search Example alpha initialized to -infinity minimax(A, 0, 4) Expand A? Yes since

Alpha-Beta Search Example beta initialized to +infinity minimax(B, 1, 4) Expand B? Yes since

Alpha-Beta Search Example alpha initialized to -infinity minimax(F, 2, 4) Expand F? Yes since

Alpha-Beta Search Example evaluate and return SBE value minimax(N, 3, 4) B F G

Alpha-Beta Search Example back to minimax(F, 2, 4) alpha = 4, since 4 >=

Alpha-Beta Search Example beta initialized to +infinity minimax(O, 3, 4) Expand O? Yes since

Alpha-Beta Search Example evaluate and return SBE value minimax(W, 4, 4) B F G

Alpha-Beta Search Example back to minimax(O, 3, 4) beta = -3, since -3 <=

Alpha-Beta Search Example FWhy? Smart opponent will choose W or worse, thus O's upper

Alpha-Beta Search Example back to minimax(F, 2, 4) alpha doesn’t change, since -3 <

Alpha-Beta Search Example back to minimax(B, 1, 4) beta = 4, since 4 <=

Alpha-Beta Search Example evaluate and return SBE value minimax(G, 2, 4) B F G

Alpha-Beta Search Example back to minimax(B, 1, 4) beta = -5, since -5 <=

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha = -5, since -5 >=

Alpha-Beta Search Example beta initialized to +infinity minimax(C, 1, 4) Expand C? Yes since

Alpha-Beta Search Example evaluate and return SBE value minimax(H, 2, 4) B F max

Alpha-Beta Search Example back to minimax(C, 1, 4) beta = 3, since 3 <=

Alpha-Beta Search Example evaluate and return SBE value minimax(I, 2, 4) B F max

Alpha-Beta Search Example back to minimax(C, 1, 4) beta doesn’t change, since 8 >

Alpha-Beta Search Example alpha initialized to -infinity minimax(J, 2, 4) Expand J? Yes since

Alpha-Beta Search Example evaluate and return SBE value minimax(P, 3, 4) B F max

Alpha-Beta Search Example back to minimax(J, 2, 4) alpha = 9, since 9 >=

Alpha-Beta Search Example FWhy? Computer will choose P or better, thus J's lower bound

Alpha-Beta Search Example back to minimax(C, 1, 4) beta doesn’t change, since 9 >

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha = 3, since 3 >=

Alpha-Beta Search Example evaluate and return SBE value minimax(D, 1, 4) B F max

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha doesn’t change, since 0 <

Alpha-Beta Search Example FHow does the algorithm finish searching the tree? B F max

Alpha-Beta Search Example Stop Expanding E since A's alpha >= E's beta is true:

Alpha-Beta Search Example Result: Computer chooses move to C. B F max N G

Game Playing Chapter 6 • Alpha-Beta Effectiveness • Other Issues • Linear Evaluation Functions

Alpha-Beta Effectiveness Þ Effectiveness depends on the order in which successors are examined. FWhat

Alpha-Beta Effectiveness If opponent’s best move where first more pruning would result: A A

Alpha-Beta Effectiveness n In practice often get O(b(d/2)) rather than O(bd) q n same

Other Issues: Dealing with Limited Time Þ In real games, there is usually a

Other Issues: Dealing with Limited Time n In practice, iterative deepening is used q

Other Issues: The Horizon Effect n Sometimes disaster lurks just beyond search depth q

Other Issues: The Horizon Effect Quiescence Search n when SBE value frequently changing, look

Other Issues: Book Moves n Build a database of opening moves, end games, studied

Linear Evaluation Functions Þ The static board evaluation function estimates how good the current

Linear Evaluation Functions n A linear evaluation function of the features is a weighted

Linear Evaluation Functions Þ The quality of play depends directly on the quality of

Linear Evaluation Functions FHow could we learn these weights? Basic idea: play lots of

Non-Deterministic Games F How do some games involve chance? roll of dice spin of

Non-Deterministic Games n e. g. extended game tree representation: A α=50/50 . 5 B

Non-Deterministic Games n n Weight score by the probabilities that move occurs Use expected

Non-Deterministic Games n Choose move with highest expected value A α=4 α= 50/50 4

Non-Deterministic Games n Non-determinism increases branching factor q 21 possible rolls with 2 dice

Case Studies: Learned to Play Well Checkers: A. L. Samuel, “Some Studies in Machine

Case Studies: Learned to Play Well Backgammon: G. Tesauro and T. J. Sejnowski, “A

Case Studies: Playing Grandmaster Chess “Deep Blue” (IBM) n Parallel processor, 32 nodes n

Case Studies: Playing Grandmaster Chess Kasparov vs. Deep Blue, May 1997 n 6 game

Case Studies: Playing Grandmaster Chess 3000 2800 2600 Chess Ratings Garry Kasparov (current World

Case Studies: Other Deterministic Games n Checkers/Draughts q q q n Othello q n

Summary n n n Game playing is best modeled as a search problem. Search

Summary n n n Minimax is a procedure that chooses moves by assuming that

Conclusion n n Initially thought to be good area for AI research. But brute

Slides: 80

Download presentation

Game Playing Chapter 6 • Game Playing and AI • Game Playing as Search • Greedy Searching Game Playing • Minimax • Alpha-Beta Pruning 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 1

Game Playing and AI FWhy would game playing be a good problem for AI research? q game playing is non-trivial n n n q games often are: n n n q players need “human-like” intelligence games can be very complex (e. g. chess, go) requires decision making within limited time well-defined and repeatable easy to represent fully observable and limited environments can directly compare humans and computers 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 2

Game Playing and AI Deterministic perfect info (fully observable) imperfect info (partially observable) 2/25/2021 Chance checkers go others? backgammon monopoly others? any? dominoes bridge others? © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 3

Game Playing as Search n Consider two-player, turn-taking, board games q q q n e. g. , tic-tac-toe, checkers, chess adversarial, zero-sum board configs: unique arrangements of pieces Representing these as search problem: q q states: edges: initial state: goal state: 2/25/2021 board configurations legal moves start board configuration winning/terminal board configuration © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 4

Game Playing as Search: Game Tree FWhat's the new aspect to the search problem? There’s an opponent that we cannot control! … X X … XO X O X O … XX X O XO How can this be handled? 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 5

Game Playing as Search: Complexity n Assume the opponent’s moves can be predicted given the computer's moves. n How complex would search be in this case? q q worst case: O(bd) branching factor, depth Tic-Tac-Toe: ~5 legal moves, 9 moves max game n q Chess: ~35 legal moves, ~100 moves per game n Þ 59 = 1, 953, 125 states 35100 ~10154 states, but only ~1040 legal states Common games produce enormous search trees. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 6

Greedy Search Game Playing Þ A utility function maps each terminal state of the board to a numeric value corresponding to the value of that state to the computer. q q positive for winning, > + means better for computer negative for losing, > - means better for opponent zero for a draw typical values (lost to win): n n 2/25/2021 -infinity to +infinity -1. 0 to +1. 0 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 7

Greedy Search Game Playing n n n Expand each branch to the terminal states Evaluate the utility of each terminal state Choose the move that results in the board configuration with the maximum value A A 9 B B -5 F F -7 C C 9 G G -5 H H 3 II 9 D D 2 J J -6 K K 0 E E 3 L L 2 M M 1 computer's possible moves N N 3 board evaluation from computer's perspective 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. O O 2 opponent's possible moves terminal states 8

Greedy Search Game Playing FAssuming a reasonable search space, what's the problem with greedy search? It ignores what the opponent might do! e. g. Computer chooses C. Opponent chooses J and defeats computer. A 9 B C -5 F -7 D 9 G -5 H 3 I 9 E 2 J -6 K 0 computer's possible moves 3 L 2 M 1 N 3 board evaluation from computer's perspective 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. O opponent's possible moves 2 terminal states 9

Minimax: Idea n Assuming the worst (i. e. opponent plays optimally): given there are two plays till the terminal states q If high utility numbers favor the computer FComputer should choose which moves? maximizing moves q If low utility numbers favor the opponent FSmart opponent chooses which moves? minimizing moves 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 10

Minimax: Idea n n The computer assumes after it moves the opponent will choose the minimizing move. It chooses its best move considering both its move and opponent’s best move. A A 1 B B -7 F -7 C C -6 G -5 H 3 I 9 D D 0 J -6 K 0 E E 1 L 2 M 1 computer's possible moves N 3 board evaluation from computer's perspective 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. O opponent's possible moves 2 terminal states 11

Minimax: Passing Values up Game Tree n n n Explore the game tree to the terminal states Evaluate the utility of the terminal states Computer chooses the move to put the board in the best configuration for it assuming the opponent makes best moves on her turns: q q start at the leaves assign value to the parent node as follows n n 2/25/2021 use minimum of children when opponent’s moves use maximum of children when computer's moves © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 12

Deeper Game Trees Minimax can be generalize for > 2 moves Values backed up in minimax way n n A A 3 B B -5 F F 4 N C C 3 G -5 O O -5 4 W -3 2/25/2021 computer max H 3 0 I 8 J J 9 P Q 9 E E -7 D -6 R 0 opponent K M min L K M 5 -7 2 computer max S T U V 3 5 -7 opponent min X -9 terminal states -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 13

Minimax: Direct Algorithm For each move by the computer: 1. Perform depth-first search to a terminal state 2. Evaluate each terminal state 3. Propagate upwards the minimax values if opponent's move minimum value of children backed up if computer's move maximum value of children backed up 4. choose move with the maximum of minimax values of children Note: • minimax values gradually propagate upwards as DFS proceeds: i. e. , minimax values propagate up in “left-to-right” fashion • minimax values for sub-tree backed up “as we go”, so only O(bd) nodes need to be kept in memory at any time 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 14

Minimax: Algorithm Complexity Assume all terminal states are at depth d FSpace complexity? depth-first search, so O(bd) FTime complexity? given branching factor b, so O(bd) Þ Time complexity is a major problem! Computer typically only has a finite amount of time to make a move. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 15

Minimax: Algorithm Complexity n Direct minimax algo. is impractical in practice q instead do depth-limited search to ply (depth) m FWhat’s the problem for stopping at any ply? q q Þ evaluation defined only for terminal states we need to know the value of non-terminal states Static board evaluator (SBE) function uses heuristics to estimate the value of non-terminal states. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 16

Minimax: Static Board Evaluator (SBE) Þ A static board evaluation function estimates how good a board configuration is for the computer. q q n it reflects the computer’s chances of winning from that state it must be easy to calculate from the board config For Example, Chess: SBE = α * material. Balance + β * center. Control + γ * … material balance = Value of white pieces - Value of black pieces (pawn = 1, rook = 5, queen = 9, etc). 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 17

Minimax: Static Board Evaluator (SBE) n n n Typically, one subtracts how good board config is for the computer from how good board is for the opponent SBE should be symmetric, if the SBE gives X for a player then it should give -X for opponent The SBE must be consistent, must agree with the utility function when calculated at terminal nodes 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 18

Minimax: Algorithm with SBE int minimax (Node s, int depth, int limit) { if (is. Terminal(s) || depth == limit) //base case return(static. Evaluation(s)); else { Vector v = new Vector(); //do minimax on successors of s and save their values while (s. has. More. Successors()) v. add. Element(minimax(s. get. Next. Successor(), depth+1, limit)); if (is. Computers. Turn(s)) return max. Of(v); //computer's move returns max of kids else return min. Of(v); //opponent's move returns min of kids } } 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 19

Minimax: Algorithm with SBE n The same as direct minimax, except q q n only goes to depth m estimates non-terminal states using SBE function How would this algorithm perform at chess? q q if could look ahead ~4 pairs of moves (i. e. 8 ply) would be consistently beaten by average players if could look ahead ~8 pairs as done in typical pc, is as good as human master 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 20

Recap n Can't minimax search to the end of the game. q n SBE isn't perfect at estimating. q n if could, then choosing move is easy if it was, just choose best move without searching Since neither is feasible for interesting games, combine minimax and SBE concepts: q q minimax to depth m use SBE to estimate board configuration 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 21

Alpha-Beta Pruning Idea n n n Some of the branches of the game tree won't be taken if playing against a smart opponent. Use pruning to ignore those branches. While doing DFS of game tree, keep track of: q alpha at maximizing levels (computer’s move) n n q highest SBE value seen so far (initialize to -infinity) is lower bound on state's evaluation beta at minimizing levels (opponent’s move) n n 2/25/2021 lowest SBE value seen so far (initialize to +infinity) is higher bound on state's evaluation © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 22

Alpha-Beta Pruning Idea n Beta cutoff pruning occurs when maximizing if child’s alpha >= parent's beta Why stop expanding children? opponent won't allow computer to take this move n Alpha cutoff pruning occurs when minimizing if parent's alpha >= child’s beta Why stop expanding children? computer has a better move than this 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 23

Alpha-Beta Search Example alpha initialized to -infinity minimax(A, 0, 4) Expand A? Yes since there are successors, no cutoff test for root B N -5 W -3 2/25/2021 H 3 I 8 P O 4 D C G F Call Stack A A α=- max 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 24

Alpha-Beta Search Example beta initialized to +infinity minimax(B, 1, 4) Expand B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff B B β=+ min N -5 W -3 2/25/2021 H 3 I 8 P O 4 α=- D C G F Call Stack A max 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 25

Alpha-Beta Search Example alpha initialized to -infinity minimax(F, 2, 4) Expand F? Yes since F’s alpha >= B’s beta is false, no beta cutoff B F G F α=- max N -5 W -3 2/25/2021 H 3 I 8 P O 4 α=- D C β=+ min Call Stack A max 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 F B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 26

Alpha-Beta Search Example evaluate and return SBE value minimax(N, 3, 4) B F G -5 α=- max N W -3 2/25/2021 H 3 I 8 P O 4 α=- D C β=+ min Call Stack A max 9 X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 N F B A 27

Alpha-Beta Search Example back to minimax(F, 2, 4) alpha = 4, since 4 >= -infinity (maximizing) Keep expanding F? Yes since F’s alpha >= B’s beta is false, no beta cutoff B F G -5 α=4 α=- max N W -3 2/25/2021 H 3 I 8 P O 4 α=- D C β=+ min Call Stack A max 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 F B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 28

Alpha-Beta Search Example beta initialized to +infinity minimax(O, 3, 4) Expand O? Yes since F’s alpha >= O’s beta is false, no alpha cutoff B F G -5 α=4 max N O min W -3 2/25/2021 H 3 I 8 P O β=+ 4 α=- D C β=+ min Call Stack A max 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 O F B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 29

Alpha-Beta Search Example evaluate and return SBE value minimax(W, 4, 4) B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=+ W α=- D C β=+ min Call Stack A max X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 blue: non-terminal state (depth limit) © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 W O F B A 30

Alpha-Beta Search Example back to minimax(O, 3, 4) beta = -3, since -3 <= +infinity (minimizing) Keep expanding O? No since F’s alpha >= O’s beta is true: alpha cutoff B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 β=+ W α=- D C β=+ min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 O F B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 31

Alpha-Beta Search Example FWhy? Smart opponent will choose W or worse, thus O's upper bound is – 3. Computer already has better move at N. B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=- D C β=+ min Call Stack A max X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 red: pruned state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 O F B A 32

Alpha-Beta Search Example back to minimax(F, 2, 4) alpha doesn’t change, since -3 < 4 (maximizing) Keep expanding F? No since no more successors for F B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=- D C β=+ min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 F B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 33

Alpha-Beta Search Example back to minimax(B, 1, 4) beta = 4, since 4 <= +infinity (minimizing) Keep expanding B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=- D C β=+ β=4 min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 34

Alpha-Beta Search Example evaluate and return SBE value minimax(G, 2, 4) B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=- D C β=4 min Call Stack A max X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 G B A 35

Alpha-Beta Search Example back to minimax(B, 1, 4) beta = -5, since -5 <= 4 (minimizing) Keep expanding B? No since no more successors for B B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=- D C β=-5 β=4 min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 B A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 36

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha = -5, since -5 >= -infinity (maximizing) Keep expanding A? Yes since there are more successors, no cutoff test B F G -5 α=4 max N O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=-5 α= D C β=-5 min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 37

Alpha-Beta Search Example beta initialized to +infinity minimax(C, 1, 4) Expand C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff B F max N G O 4 min C -5 α=4 -3 2/25/2021 H 3 I 8 P 9 β=-3 W α=-5 α= D C β=+ β=-5 min Call Stack A max E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 C A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 38

Alpha-Beta Search Example evaluate and return SBE value minimax(H, 2, 4) B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=+ -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 H C A 39

Alpha-Beta Search Example back to minimax(C, 1, 4) beta = 3, since 3 <= +infinity (minimizing) Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=+ β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 C A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 40

Alpha-Beta Search Example evaluate and return SBE value minimax(I, 2, 4) B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 I C A 41

Alpha-Beta Search Example back to minimax(C, 1, 4) beta doesn’t change, since 8 > 3 (minimizing) Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 C A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 42

Alpha-Beta Search Example alpha initialized to -infinity minimax(J, 2, 4) Expand J? Yes since J’s alpha >= C’s beta is false, no beta cutoff B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J J α=Q -6 L K R 0 S 3 M 2 T 5 U -7 X V -9 J C A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 43

Alpha-Beta Search Example evaluate and return SBE value minimax(P, 3, 4) B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K α=- R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 P J C A 44

Alpha-Beta Search Example back to minimax(J, 2, 4) alpha = 9, since 9 >= -infinity (maximizing) Keep expanding J? No since J’s alpha >= C’s beta is true: beta cutoff B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K α=9 α=- R 0 S 3 M 2 T 5 U -7 red: pruned states © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 J C A 45

Alpha-Beta Search Example FWhy? Computer will choose P or better, thus J's lower bound is 9. Smart opponent won’t let computer take move to J (since opponent already has better move at H). B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 red: pruned states © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 J C A 46

Alpha-Beta Search Example back to minimax(C, 1, 4) beta doesn’t change, since 9 > 3 (minimizing) Keep expanding C? No since no more successors for C B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=-5 α= C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 X V -9 C A -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 47

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha = 3, since 3 >= -5 (maximizing) Keep expanding A? Yes since there are more successors, no cutoff test B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=-5 α=3 C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 48

Alpha-Beta Search Example evaluate and return SBE value minimax(D, 1, 4) B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=3 α= C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 green: terminal state © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 D A 49

Alpha-Beta Search Example back to minimax(A, 0, 4) alpha doesn’t change, since 0 < 3 (maximizing) Keep expanding A? Yes since there are more successors, no cutoff test B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=3 α= C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 50

Alpha-Beta Search Example FHow does the algorithm finish searching the tree? B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=3 α= C β=-5 min Call Stack A max J Q -6 L K α=9 R 0 S 3 M 2 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 51

Alpha-Beta Search Example Stop Expanding E since A's alpha >= E's beta is true: alpha cutoff FWhy? Smart opponent will choose L or worse, thus E's upper bound is 2. Computer already has better move at C. B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D E 0 β=3 -5 α=4 α=3 α= C β=-5 min Call Stack A max β=2 J K α=9 Q -6 L R 0 S 3 M 2 α=5 T 5 U -7 V -9 A X -5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 52

Alpha-Beta Search Example Result: Computer chooses move to C. B F max N G O 4 min -3 2/25/2021 H 3 I 8 P 9 β=-3 W D X -5 E 0 β=3 -5 α=4 α=3 α= C β=-5 min Call Stack A max β=2 J K α=9 Q -6 L R 0 S 3 M 2 α=5 T 5 U -7 green: terminal states, red: pruned states blue: non-terminal state (depth limit) © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. V -9 A 53

Game Playing Chapter 6 • Alpha-Beta Effectiveness • Other Issues • Linear Evaluation Functions • Non-Deterministic Games • Case Studies 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 54

Alpha-Beta Effectiveness Þ Effectiveness depends on the order in which successors are examined. FWhat ordering gives more effective pruning? More effective if best successors are examined first. n n Best Case: each player’s best move is left-most Worst Case: ordered so that no pruning occurs q n no improvement over exhaustive search In practice, performance is closer to best case rather than worst case. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 55

Alpha-Beta Effectiveness If opponent’s best move where first more pruning would result: A A α=3 E E β=2 K L 3 2/25/2021 T 5 L M 2 α=5 S β=2 U -7 K 2 V -9 S 3 M T 5 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. U -7 V -9 56

Alpha-Beta Effectiveness n In practice often get O(b(d/2)) rather than O(bd) q n same as having a branching factor of sqrt(b) recall (sqrt(b))d = b(d/2) For Example: chess q q q goes from b ~ 35 to b ~ 6 permits much deeper search for the same time makes computer chess competitive with humans 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 57

Other Issues: Dealing with Limited Time Þ In real games, there is usually a time limit T on making a move. FHow do we take this into account? q q q cannot stop alpha-beta midway and expect to use results with any confidence so, we could set a conservative depth-limit that guarantees we will find a move in time < T but then, the search may finish early and the opportunity is wasted to do more search 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 58

Other Issues: Dealing with Limited Time n In practice, iterative deepening is used q q run alpha-beta search with an increasing depth limit when the clock runs out, use the solution found for the last completed alpha-beta search (i. e. the deepest search that was completed) 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 59

Other Issues: The Horizon Effect n Sometimes disaster lurks just beyond search depth q n e. g. computer captures queen, but a few moves later the opponent checkmates The computer has a limited horizon, it cannot see that this significant event could happen FHow do you avoid catastrophic losses due to “short-sightedness”? q q quiescence search secondary search 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 60

Other Issues: The Horizon Effect Quiescence Search n when SBE value frequently changing, look deeper than limit looking for point when game quiets down q q n Secondary Search 1. 2. 3. 2/25/2021 find best move looking to depth d look k steps beyond to verify that it still looks good if it doesn't, repeat step 2 for next best move © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 61

Other Issues: Book Moves n Build a database of opening moves, end games, studied configurations n If the current state is in the database, use database: q q n to determine the next move to evaluate the board Otherwise do alpha-beta search 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 62

Linear Evaluation Functions Þ The static board evaluation function estimates how good the current board configuration is for the computer. q it is a heuristic function of the board's features n q i. e. , function(f 1, f 2, f 3, …, fn) the features are numeric characteristics n n n 2/25/2021 feature 1, f 1, is number of white pieces feature 2, f 2, is number of black pieces feature 3, f 3, is f 1/f 2 feature 4, f 4, is estimate of “threat” to white king etc… © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 63

Linear Evaluation Functions n A linear evaluation function of the features is a weighted sum of f 1, f 2, f 3. . . w 1 * f 1 + w 2 * f 2 + w 3 * f 3 + … + w n * fn q q Þ where f 1, f 2, …, fn are the features and w 1, w 2 , …, wn are the weights More important features get more weight. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 64

Linear Evaluation Functions Þ The quality of play depends directly on the quality of the evaluation function. n To build an evaluation function we have to: construct good features using expert domain knowledge 2. pick or learn good weights 1. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 65

Linear Evaluation Functions FHow could we learn these weights? Basic idea: play lots of games against an opponent q q for every move (or game) look at the error = true outcome - evaluation function if error is positive (underestimating) adjust weights to increase the evaluation function if error is zero do nothing if error is negative (overestimating) adjust weights to decrease the evaluation function 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 66

Non-Deterministic Games F How do some games involve chance? roll of dice spin of game wheel deal cards from shuffled deck q q q F How can we handle games of chance? n The game tree representation is extended to include chance nodes: 1. 2. 3. 2/25/2021 computer moves chance nodes opponent moves © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 67

Non-Deterministic Games n e. g. extended game tree representation: A α=50/50 . 5 B 2/25/2021 50/50 . 5 C β=2 7 . 5 max . 5 D β=6 2 9 chance E β=0 6 5 β=-4 0 8 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. min -4 68

Non-Deterministic Games n n Weight score by the probabilities that move occurs Use expected value for move: sum of possible random outcomes A α= 50/50 4 . 5 B 2/25/2021 50/50 -2 . 5 C β=2 7 . 5 max . 5 D β=6 2 9 chance E β=0 6 5 β=-4 0 8 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. min -4 69

Non-Deterministic Games n Choose move with highest expected value A α=4 α= 50/50 4 -2 . 5 B 2/25/2021 . 5 D β=6 2 9 chance . 5 C β=2 7 max E β=0 6 5 β=-4 0 8 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. min -4 70

Non-Deterministic Games n Non-determinism increases branching factor q 21 possible rolls with 2 dice n Value of look ahead diminishes: as depth increases probability of reaching a given node decreases alpha-beta pruning less effective n TDGammon: n q q q depth-2 search very good heuristic plays at world champion level 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 71

Case Studies: Learned to Play Well Checkers: A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers, ” IBM Journal of Research and Development, 11(6): 601 -617, 1959 n Learned by playing thousands of times against a copy of itself n Used only an IBM 704 with 10, 000 words of RAM, magnetic tape, and a clock speed of 1 k. Hz n Successful enough to compete well at human tournaments 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 72

Case Studies: Learned to Play Well Backgammon: G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon, ” Artificial Intelligence, 39(3), 357 -390, 1989 n Also learns by playing against copies of itself n Uses a non-linear evaluation function a neural network n Rates in the top (three) players in the world 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 73

Case Studies: Playing Grandmaster Chess “Deep Blue” (IBM) n Parallel processor, 32 nodes n Each node has 8 dedicated VLSI “chess chips” n Can search 200 million configurations/second n Uses minimax, alpha-beta, sophisticated heuristics n In 2001 searched to 14 ply (i. e. 7 pairs of moves) n Can avoid horizon by searching as deep as 40 ply n Uses book moves 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 74

Case Studies: Playing Grandmaster Chess Kasparov vs. Deep Blue, May 1997 n 6 game full-regulation match sponsored by ACM n Kasparov lost the match 2 wins to 3 wins and 1 tie n This was a historic achievement for computer chess being the first time a computer became the best chess player on the planet. n Note that Deep Blue plays by “brute force” (i. e. raw power from computer speed and memory). It uses relatively little that is similar to human intuition and cleverness. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 75

Case Studies: Playing Grandmaster Chess 3000 2800 2600 Chess Ratings Garry Kasparov (current World Champion) 2400 2200 Deep Blue Deep Thought 2000 1800 1600 1400 1200 1966 2/25/2021 1976 1981 1986 1991 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 1997 76

Case Studies: Other Deterministic Games n Checkers/Draughts q q q n Othello q n world champion is Chinook beats any human, (beat Tinsley in 1994) uses alpha-beta search, book moves (>443 billion) computers easily beat world experts Go q q branching factor b ~ 360, very large! $2 million prize for any system that can beat a world expert 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 77

Summary n n n Game playing is best modeled as a search problem. Search trees for games represent alternate computer/opponent moves. Evaluation functions estimate the quality of a given board configuration for each player. - good for opponent 0 neutral + good for computer 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 78

Summary n n n Minimax is a procedure that chooses moves by assuming that the opponent always chooses their best move. Alpha-beta pruning is a procedure that can eliminate large parts of the search tree thus enabling the search to go deeper. For many well-known games, computer algorithms using heuristic search can match or out-perform human world experts. 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 79

Conclusion n n Initially thought to be good area for AI research. But brute force has proven to be better than a lot of knowledge engineering. q q more high-speed hardware issues than AI issues simplifying AI part enabled scaling up of hardware n Is a good test-bed for computer learning. n Perhaps machines don't have to think like us? 2/25/2021 © 2001 -2004 James D. Skrentny from notes by C. Dyer, et. al. 80