Implementation of Reinforcement Learning In Coordinated Group Activities

  • Slides: 24
Download presentation
Implementation of Reinforcement Learning In Coordinated Group Activities By Ashwinkumar Ganesan CMSC 601

Implementation of Reinforcement Learning In Coordinated Group Activities By Ashwinkumar Ganesan CMSC 601

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

What is Reinforcement Learning? Method for learning by experience There agent or bot learns

What is Reinforcement Learning? Method for learning by experience There agent or bot learns by interacting with the environment. There is reward attached for action taken in a particular state. GOAL : MAXIMIZE THE REWARD

Bellman Equation (RL in a bit more detail)

Bellman Equation (RL in a bit more detail)

Markov Decision Process

Markov Decision Process

Q-Learning (A Reinforcement Learning Algorithm)

Q-Learning (A Reinforcement Learning Algorithm)

What is Co-ordinated Group Action? Co-ordinated Group Action is the situation a set of

What is Co-ordinated Group Action? Co-ordinated Group Action is the situation a set of agents perform a single task. GOAL: To maximize the output or the reward globally.

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Some Problems In Multi. Agent systems… Communication i. e. what should the agent communicate

Some Problems In Multi. Agent systems… Communication i. e. what should the agent communicate and how much should it communicate with other agents. Optimal Policy i. e. defining an optimal policy of the entire group. Is an optimal policy a set of optimal individual policies for each agent? How much of the individual policy information of a certain agent is available to the entire group.

What Am I Proposing? Create a method for implementing reinforcement learning on co-ordinated group

What Am I Proposing? Create a method for implementing reinforcement learning on co-ordinated group activity efficiently Modify Reinforcement Learning algorithm to implement group action Implement the proposed method and measure its efficiency in World of Warcraft.

Problem Environment 1. 2. 3. 4. 5. The proposal is to research co-operative group

Problem Environment 1. 2. 3. 4. 5. The proposal is to research co-operative group learning under the following conditions: The environment is assumed to partially observable. Each agent in the system is knows the final action taken by the other agent. Agents do not have access the to state information generated by other agents while selecting an action. Agents do not have access to the policies of other agents, to make the decision. The rewards are not linearly separable.

World Of Warcraft World of Warcraft is a large multi-player online game by Blizzard

World Of Warcraft World of Warcraft is a large multi-player online game by Blizzard Entertainment It is game where every player has his own and roams the virtual world, fighting demons, observing the landscape, buy and selling items and interacting with other players In short, it is a large game with lots of options for tools, skills and levels, making it challenging for Bots.

World Of Warcraft http: //world-of-warcraft. en. softonic. com/

World Of Warcraft http: //world-of-warcraft. en. softonic. com/

Motivation Today games especially like World Of Warcraft, have a large number of human

Motivation Today games especially like World Of Warcraft, have a large number of human players who play in groups. Single as well as multiplayer games have AI engines, but attacks and actions in these games by opponents is still one at a time. Reinforcement Learning can be implemented real time in these games, to improve the AI over a period of time and customize the games for users Other applications – Robotics, defense

Related Work QUICR Method: This method calculates the counterfactual action, which is the action

Related Work QUICR Method: This method calculates the counterfactual action, which is the action an agent did not take at a time[2]. Least Square Policy Iteration (LSPI) can implemented. The method performs policy iterations using samples instead of an actual policy[3]. FMQ Algorithm. The algorithm is helpful for environments where agents have partial or no observability[4].

Inverse Reinforcement Learning Inverse reinforcement learning is the exact opposite of reinforcement learning. Input

Inverse Reinforcement Learning Inverse reinforcement learning is the exact opposite of reinforcement learning. Input is the optimal policy or behavior that is expected from the agent The agent learns to find the reward function based on observed values in the environment

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Proposed Method 1. 2. Implement Reinforcement Learning in 2 parts: Implement Inverse Reinforcement Learning

Proposed Method 1. 2. Implement Reinforcement Learning in 2 parts: Implement Inverse Reinforcement Learning Implement a modified Reinforcement Learning on the rewards learned in step 1. Observe Expert Calculate Reward Observe other agents Calculate Policy based on reward Calculate the new Q(a, s) value

Challenges Calculating the iteration when the agent is known to have optimal policy Finding

Challenges Calculating the iteration when the agent is known to have optimal policy Finding the point when to switch from Inverse Learning to Reinforcement Learning Working with the reward function got from observing the expert and rewards obtained from the environment Finding a method to observe other agents and “experts”.

Evaluation Metrics The metrics will be evaluated against the known methods Metrics are: Number

Evaluation Metrics The metrics will be evaluated against the known methods Metrics are: Number of states generated Number of iterations required to reach optimal policy Rewards in terms of points earned (in the game) Rate of Convergence to optimal Policy 1. 2. 3. 4.

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Agenda Reinforcement Learning Problem Statement Proposed Method Conclusions

Conclusion We can use Inverse reinforcement learning with reinforcement learning methods to speed up

Conclusion We can use Inverse reinforcement learning with reinforcement learning methods to speed up the learning time required for bots. Improve bot reward functions over time.

References 1. R Bellman, On the Theory of Dynamic Programming, Proceedings of the National

References 1. R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952 2. Adrian K. Agogino and Kagan Turner, QUICKER Q-Learning in Multi-agent systems. 3. Lihong Li, Micheal L. Littman, Christopher R. Mansley, Online Exploration in Least – Squares policy Iteration. 4. La¨etitia Matignon, Guillaume J. Laurent and Nadine Le Fort. Piat, A study of FMQ heuristic in cooperative multi-agent games 5. Acknowledgement to Prof. Tim Oates for helping with the literature survey.

QUESTIONS?

QUESTIONS?