Reinforcement Learning Developing a selflearning snake game using

  • Slides: 24
Download presentation
Reinforcement Learning Developing a self-learning snake game using Reinforcement Learning and pygame.

Reinforcement Learning Developing a self-learning snake game using Reinforcement Learning and pygame.

About me Github: https: //github. com/satwikkansal Linkedin: https: //linkedin. com/in/satwikkansal - Student, Pursuing my

About me Github: https: //github. com/satwikkansal Linkedin: https: //linkedin. com/in/satwikkansal - Student, Pursuing my Bachelor’s in Software Engineering - Freelance Software Developer - A FOSS enthusiast, currently contributing to coala Website: http: //www. satwikkansal. xyz Blog: https: //satwikkansal. wordpress. com - Pythonista, loves to develop automation projects, Machine Learning projects and occasionally write blogs regarding python.

Do you remember these?

Do you remember these?

Contents 1. Quick Intro to Game Development : Common concepts 2. Designing the gameplay

Contents 1. Quick Intro to Game Development : Common concepts 2. Designing the gameplay 3. Events and control, Implementing game logic 4. Some RL concepts: Agent, State, Reward, Policy, MDP and few more. 5. Q-Learning to the Rescue 6. Other Reinforcement Learning Techniques The code for the workshop is available at https: //github. com/satwikkansal/snakepy

Some Game Development concepts - Coordinates : The screen is a 2 D grid

Some Game Development concepts - Coordinates : The screen is a 2 D grid plane with (0, 0) in the top left - Colors: RGB and alpha values - Drawing: Plotting pixels, Surface Object, blitting - Rendering: Animation, Frame/Refresh rate - The game loop:

Designing the Gameplay Objects : A snake, Apples, Walls Snake eats the apples, grows

Designing the Gameplay Objects : A snake, Apples, Walls Snake eats the apples, grows 1 unit longer. Snake dies when it hits the wall or runs over itself. Objective: Eat as many apples as possible without dying. - What happens when the snake gets killed? - How to start the game?

Code Implementation: Drawing, Displaying and Moving the game objects.

Code Implementation: Drawing, Displaying and Moving the game objects.

User Interaction & Game Logic - Arrow keys to move the head. - Do

User Interaction & Game Logic - Arrow keys to move the head. - Do we want our snake to keep moving. - Detecting overlaps and collisions of snake head with other objects : boundaries, apples and its body. - Scoring

Code Implementation: Adding the controls and the score to make a fully functional snake

Code Implementation: Adding the controls and the score to make a fully functional snake game.

Okay, let’s make our dumb computer control the snake.

Okay, let’s make our dumb computer control the snake.

Code Implementation: Wait, let’s add some intelligence to our agent. (Provide vision to the

Code Implementation: Wait, let’s add some intelligence to our agent. (Provide vision to the CPU i. e. game rules) Next Section: Or better, let’s make the CPU discover knowledge. (Make our snake learn from experiences)

Time to introduce Reinforcemen t Learning!

Time to introduce Reinforcemen t Learning!

A few things to know - State, History and Episode Action Reward Policy, value

A few things to know - State, History and Episode Action Reward Policy, value function, and model Environment Agent Markov states and MDP Long story short : Everything that surrounds the agent in environment. A state represents the situation of the agent at a particular time in the environment. The agent performs an action to transition from one state to another and may receive a reward in return. The policy is the strategy of choosing an action given a state and the agent tries to chose a policy that optimizes the expected cumulative reward.

Implementation: Refactoring the game’s code

Implementation: Refactoring the game’s code

Q-learning to the rescue! ● Popular, Simple, Model free RL technique (Environment’s model is

Q-learning to the rescue! ● Popular, Simple, Model free RL technique (Environment’s model is not required) ● Can find optimal action-selection policy for any finite MDP. ● Learns the action-value function

Code Implementation: Using Q-learning to choose actions for the agent.

Code Implementation: Using Q-learning to choose actions for the agent.

Our agent in action Note: Currently our rules don’t penalize snake for running over

Our agent in action Note: Currently our rules don’t penalize snake for running over itself.

Possible Improvements to our agent - Optimizing the state space - Adding time-based rewards

Possible Improvements to our agent - Optimizing the state space - Adding time-based rewards - Minimizing the exploration v/s exploitation tradeoff - Optimizing the hyperparameters using techniques like Grid Search, Genetic Algorithms. - Using state of the art RL techniques.

Other interesting techniques SARSA: Uses Q-Learning as a part of policy iteration mechanism, next

Other interesting techniques SARSA: Uses Q-Learning as a part of policy iteration mechanism, next action is chosen randomly with predefined probability, faster than Q-learning when no. of actions are high. Deep Q-Networks: Combines usage of RL and Deep Neural Networks like CNN. Learns the non-linear value-action function through experience replay.

The self-driving car simulation design State: Rewards: - Car on left, right, ahead? -

The self-driving car simulation design State: Rewards: - Car on left, right, ahead? - Violating the traffic laws - Traffic light green or red? - Hitting the obstacles - Next waypoint (from GPS) - Reaching the destination Actions: - Steer Left, Steer Right - Time taken to reach destination (any thoughts on this? ) Code Sample available at: https: //github. com/satwikkansal/smartcab

Applications of Reinforcement Learning - Playing games like chess (reward is not instantaneous, delayed

Applications of Reinforcement Learning - Playing games like chess (reward is not instantaneous, delayed feedback) - Managing portfolio and finances (reward here is the money) - Robotics (humanoid robots) - Manufacturing and inventory management. - General AI agents: Agents that can perform multiple things with single algorithm. Example, an agent playing all the Atari games.

Open source frameworks and libraries for RL Open AI gym - A toolkit for

Open source frameworks and libraries for RL Open AI gym - A toolkit for developing and comparing reinforcement learning algorithms. Open AI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications Deepmind Lab - A customisable 3 D platform for agent-based AI research

Some nice links Youtube lectures and tutorials: - UCL course on RL by D.

Some nice links Youtube lectures and tutorials: - UCL course on RL by D. Silver - http: //bit. ly/RL-UCL - Sentdex pygame tutorial - http: //bit. ly/sentdex-pygame Python Code Samples: - Reinforcement Learning, an introduction - http: //bit. ly/RL-intro-Python Online Demo: