Learning Prospective Robot Behavior Shichao Ou and Roderic

Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

A Developmental Approach • Infant Learning – In stages • Maturation processes – Parents provide constrained learning contexts • Protect • Easy Complex – Motion mobile for newborns – Use brightly colored, easy to pick up objects – Use building blocks – Association of words and objects LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Application in Robotics • Framework for Robot Developmental Learning – Role of teacher: setup learning contexts that make target concept conspicuous – Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback • Control Basis – Robot actions are created using combinations of <σ, ф, τ> – Establish stages of learning by time-varying constraints on resources • Easy Complex LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Example • Learning to Reach for Objects – Stage 1: Search. Track • Focus attention usingle brightly colored object (σ) • Limit DOF (τ) to use head ONLY – Stage 2: Reach. Grab • Limit DOF (τ) to use one arm ONLY – Stage 3: Handedness, Scale. Sensitive Hart et. al, 2008 LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning • Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Robot Prospective Learning with Human Guidance S 0 Challenge S 0 a 0 S 1 a 1 ai-1 Si Si ai aj-1 Sj aj an-1 Sn Sn g(f)=1 g(f)=0 S 0 a 0 S 1 ai-1 Si ai aj-1 Sj aj an-1 Sn sub-task Si 1 Sij Sin LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

A 2 D Navigation Domain Problem • 30 x 30 map • 6 doors, randomly closed • 6 buttons • 1 start and 1 goal • 3 -bit door sensor on robot LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Flat Learning Results • Flat Q-Learning – 5 -bit state • (x, y, door-bit 1, door-bit 2, door -bit 3) – 4 actions • up, down, left, right – Reward • 1 for reaching the goal • -0. 01 for every step taken – Learning parameter • α=0. 1, γ=1. 0, ε=0. 1 • Learned solutions after 30, 000 episodes LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning • Stage 1 – All doors open – Constrain resources to use only (x, y) sensors – Allow agent learn a policy from start to goal S 0 Right S 1 Down Right Si Right Up Sj Right Sn LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause – Create a sub-task – Learn a new policy to subtask LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause – Create a sub-task – Learn a new policy to subtask – Resume original policy LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning Results Learned solutions < 2000 episodes LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Humanoid Robot Manipulation Domain • Benefits of Prospective Learning – Adapt to new contexts by maintaining majority of the existing policy – Automatically generates sub-goals – Sub-task can be learned in a completely different state space. – Supports interactive learning LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Conclusion • A developmental view to robot learning • A framework enables interactive incremental learning in stages • Extension to the control basis learning framework using the idea of prospective learning LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
- Slides: 15