PacMan Will Britt and Bryan Silinski PacMan Background
Pac-Man Will Britt and Bryan Silinski
Pac-Man Background Information • In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) • Ghosts move randomly around the stage. • Goal is to eat all the dots while avoiding the ghosts. • Score Manipulators: • • • Eat Dot +10 Win +500 Eat Ghost +200 Eaten by Ghost -500 Move(time) -1
Formal Statement • Given a set of N moves, our agent should choose a move which will best maximize utility. • Utility will be determined by a performance evaluation function, objective criterion for success of an agent’s behavior. • For set N moves: max(U(Ni)) where Ni is a move from the set, and U() is the utility evaluation function.
Utility • Utility represents the motivation of an agent. In our game, the motivations are things such as: eating dots, eating power pellets, avoiding ghosts, etc. A utility function assigns a score for every possible outcome, a higher score represents a higher preference for that particular outcome. • Our utility function is ordinal, which means that the decisions will be based on the relative orderings of possible outcomes and the degree of difference does not matter.
Informal Statement • We aim to navigate the Pac-Man agent to best avoid ghosts and eat the pellets. • Given context from the environment(proximity of ghosts, dots, etc. ), we want Pac-Man to make the most rational choice for movement in hopes that this will lead to the agent performing best at the game. • A rational agent is one that maximizes utility based on current knowledge.
Algorithms
Algorithms Chosen • Reflex Agent • Minimax • Depth 2 • Depth 3 • Depth 4 • Expectimax • Depth 2 • Depth 3 • Depth 4 • q. Learning • • 50 Training Episodes 100 Training Episodes 500 Training Episodes 1000 Training Episodes
Reflex Agent • A reflex agent only looks at the current state and a potential move on the game board in order to choose its next move. • Does not consider the consequences of the chosen move in terms of what happens afterward. • “I am here, which move appears to have the best utility”
Reflex Agent (continued) • In Pac-Man, the agent has to decide between making 5 moves at maximum (North, South, East, West, and Stop) • In order for a reflex agent to be used, we needed to implement a function to evaluate how “good” each move was (calculate the utility). • This performance evaluation utility function looked at things such as: will the next move bring the agent closer to food? closer to a ghost? obtain a power pellet. • Each possible move is ran through the evaluation function and the move with the best score is chosen (ordinal utility function).
Reflex Agent(continued) • Potential move scores are calculated very fast. • O(n) where n represents the number of possible moves evaluated by the utility function. • For our example, Pac-Man can have 35 possible moves at any given state which are run through the utility function in order to score each one. • One disadvantage is that the agent does not look far in advance enough to consider the consequences of the actions.
Minimax • Often implemented in two-player “full information” games. • Full information games are games in which players knows all possible moves of the adversary. Ex: (Chess, Tic-Tac Toe, etc. ) • One player tries to maximize their scores(i. e. Pacman) while the adversary tries to minimize the opponents score(i. e. Pacman) • Minimax takes into account future moves by both the player and the opponent in order to best choose a decision. • Minimax also operates under the assumption that the adversary will make the optimal choice.
Minimax Implementation • If the game over state is reached, return the score from the player’s point of view. • Else, get game states for every possible move for whichever player’s turn it is. • create a list of scores from those states using some sort of performance evaluation function (utility function). • If the turn is the opponent’s then return the minimum score from the score list. • If the turn is the player’s then return the maximum score from the score list.
Minimax • The time complexity for the minimax algorithm is O(b^n) where b^n represents the amount of game states sent to the utility function. • - b represents the amount of game states per depth, in Pac-Man this would be 3 -5(Pac-man successor states) multiplied by 4 -16( ghost successor game states). • - n represents depth
Expectimax • Expectimax is similar to minimax but does not assume an optimal adversary. • Takes into account the probabilities of outcomes. • Very similar to the minimax algorithm, but adds in chance nodes. • Expectimax makes decisions based on expected utilities.
Expectimax • The time complexity of O(b^n) is the same as minimax, where b^n represents the amount of game states evaluated by the utility function. • Once again, b represents the amount of game states per depth ( in Pac-Man this would be 3 -5(Pac-man successor states) multiplied by 4 -16( ghost successor game states). • n represents depth
Minimax vs. Expectimax
Q Learning State, Action based machine learning algorithm Good for room traversal or mapping Not equipped for larger problems such as moving ghosts We implemented an approximate q. Learning algorithm in such it attempts to find similarities while training. • Uses “features” to determine important or not important information about the game board by updating the weight of each feature in order to converge upon the best weight. • Given the update works it should not matter if you duplicate features because the training will alter the weight to adjust for error • O(1) due to making decisions based on a lookup table • •
q. Learning Update
q. Learning Features • Bias: Way to minimize error in machine learning algorithms • (State, Action): Navigate the map more efficiently • Ghosts one step away: Avoid the ghosts • Eats Food: Eating food is crucial to win the game
Algorithm Results
Algorithm Avg Move Time Avg # Moves AVG Score Win % Move STD Reflex 0. 001919824 68. 144 -445. 024 0 52. 22509669 Minimax 2 0. 004229125 189. 043 975. 877 0. 672 81. 1781562 Minimax 3 0. 012895651 322. 443 801. 657 0. 648 151. 2438741 Minimax 4 0. 061221232 411. 292 818. 098 0. 682 193. 9309773 Expecti. Max 2 0. 016378297 168. 175 1223. 215 0. 836 60. 59668583 Expecti. Max 3 0. 048551765 192. 308 1283. 672 0. 915 54. 26267201 Expecti. Max 4 0. 234599232 200. 735 1305. 865 0. 927 52. 37050166 Qlearn 50 0. 000527238 132. 411 1191. 929 0. 9 29. 7713354 Qlearn 100 0. 000494471 129. 223 1214. 287 0. 91 25. 68413802 Qlearn 500 0. 000504648 130. 534 1230. 316 0. 921 24. 23758041 Qlearn 1000 0. 000488254 130. 6677632 1205. 631579 0. 90296053 25. 28541812 Qlearn 50 W/ Train* 0. 001037794 1191. 929 0. 9 0 Qlearn 100 w/Train* 0. 000993991 1214. 287 0. 91 0 Qlearn 500 w/Train* 0. 000958897 1230. 316 0. 921 0 Qlearn 1000 w/Train* 0. 000942771 1205. 631579 0. 90296053 0
Issues and Future Consideration • The utility function is subjective. • we assigned weights to what we thought was important (avoiding ghosts, eating dots) • these weights may not have been the best choices possible • might have been useful to develop an algorithm to come up with weights • Alpha-beta pruning • Reflex Agent issues
Questions • What is utility? • What is a rational agent? • Why does a reflex agent have a time complexity of O(n)? • When would it be more beneficial to use expectimax instead of minimax?
Questions • What is utility? - Utility represents the motivation of an agent or the usefulness of the consequences of a particular action. What is a rational agent? - A rational agent is one that maximizes utility based on current knowledge. Why does a reflex agent have a time complexity of O(n)? - A reflex agent runs as many times as possible choices available for making a decision. n represents these choices. • When would it be more beneficial to use expectimax instead of minimax? - When there are probabilities involved and it would be more favorable to calculate expected utilities.
Questions?
- Slides: 27