Hierarchical POMDP Solutions Georgios Theocharous 1 Sequential Decision
- Slides: 59
Hierarchical POMDP Solutions Georgios Theocharous 1
Sequential Decision Making Under Uncertainty Symptoms AGENT OBSERVATIONS & REWARDS HIDDEN STATES What is T rea the optimal tm ACTIONS s en t policy? s ts Te ENVIRONMENT 2
Manufacturing Processes (Mahadevan, Theocharous FLAIRS 98) Machine Buffer Reward: States: Actions: Observations: • Parts in • Machine • Produce buffers internal • Maintenance state • Throughput • Reward for consuming • Penalize for filling buffers • Penalize for machine breakdown What is the optimal policy? 3
Foveated Active Vision (Minut) States: • Objects Observations: • Local features Actions: Reward: • Where to saccade next • Reward for finding object • What features to use What is the optimal policy? 4
Many More Partially Observable Problems § Assistive technologies § Web searching, preference elicitation § Sophisticated Computing § Distributed file access, Network trouble-shooting § Industrial § Machine maintenance, manufacturing processes § Social § Education, medical diagnosis, health care policymaking § Corporate § Marketing, corporate policy § …. 5
Overview § Learning models of partially observable problems is far from a solved problem § Computing policies for partially observable domains is intractable § We Propose hierarchical solutions § Learn models using less space and time § Compute robust policies that cannot be computed by previous approaches 6
How? Spatial and Time Abstractions Reduce Uncertainty Spatial abstraction MIT Temporal abstraction 7
Outline § Sequential decision-making under uncertainty § A Hierarchical POMDP model for robot navigation § Heuristic macro-action selection in H-POMDPs § Near Optimal macro-action selection for arbitrary POMDPs § Representing H-POMDPs as DBNs § Current and Future directions 8
A Real System: Robot Navigation 0. 1 0. 8 S 1 Transition matrix for the Go-Forward action 0. 1 S 5 S 9 S 1 … S 5 … S 9 0. 1 0. 0 0. 8 0. 0 0. 1 … S 15 Observations Actions Observation model for S 1 OBSERVATION OOOO … OWOW … WWWW 0. 0 0. 15 0. 7 0. 15 0. 0 9
Belief States (Probability Distributions over states) True State Belief State 10
Belief States (Probability Distributions over states) True State Belief State 11
Belief States (Probability Distributions over states) True State Belief State 12
Learning POMDPs § Given As and Zs compute Ts and Os § Estimate probability distribution over hidden states § Count number of times a state was visited § Update T and O and repeat. A 1 A 2 T(S 1=i, A 1=a, S 2=j) S 1 Z 1 § It is an Expectation Maximization algorithm: S 2 Z 2 S 3 Z 3 O(O 2=z, S 2=i, A 1=a) An iterative procedure for doing maximum likelihood parameter estimation over hidden state variables § Converges to local maxima § 13
Planning in POMDPs § Belief states constitute a sufficient statistic for making decisions (Markov property holds: Astrom 1965) § Bellman equation: OBSERVATION (z) Since we have an infinite state space, the problem becomes computationally intractable (PSPACE hard for finite horizon) (UNDECIDABLE for infinite horizon) ENVIRONMENT AGENT STATE ESTIMATOR ACTION (a) BELIEF STATE (b) POLICY(p) 14
Our Solution: Spatial and Temporal Abstraction § Learning § A hierarchical Baum-Welch algorithm, which is derived from the Baum-Welch algorithm for training HHMMs (with Rohanimanesh and Mahadevan, ICRA 2001) § Structure learning from weak priors (with Mahadevan IROS 2002) § Inference can be done in linear time by representing H-POMDPs as Dynamic Bayesian Networks (DBNs) (with Murphy and Kaelbling, ICRA 2004) § Planning § Heuristic macro-action selection (with Mahadevan, ICRA 2002) § Near optimal macro-action selection (with Kaelbling, NIPS 2003) § Structure Learning and Planning combined § Dynamic POMDP abstractions (with Mannor and Kaelbling) 15
Outline § Sequential decision-making under uncertainty § A Hierarchical POMDP model for robot navigation § Heuristic macro-action selection in H-POMDPs § Near Optimal macro-action selection for arbitrary POMDPs § Representing H-POMDPs as DBNs § Current and Future directions 16
Hierarchical POMDPs WEST EAST 17
Hierarchical POMDPs - ABSTRACT STATES + ACTIONS (Fine, Singer, Tishby, MLJ 98) 18
Experimental Environments 600 states 1200 states 19
The Robot Navigation Domain § The robot Pavlov in the real MSU environment § The Nomad 200 simulator 20
Learning Feature Detectors (Mahadevan, Theocharous, Khaleeli: MLJ 98) § 736 hand-labeled-grids § 8 -fold cross-validation § Classification error (m=7. 33, s=3. 7) 21
Learning and Planning in H-POMDPs for Robot Navigation LEARNING INITIAL H-POMDP COMPILATION HAND CODING ENVIRONMENT TOPOLOGICAL MAP PLANNING EM EXECUTION TRAINED H-POMDP NAVIGATION SYSTEM 22
Outline § Sequential decision-making under uncertainty § A Hierarchical POMDP model for robot navigation § Heuristic macro-action selection in H-POMDPs § Near Optimal macro-action selection for arbitrary POMDPs § Representing H-POMDPs as DBNs § Current and Future directions 23
Planning in H-POMDPs (Theocharous, Mahadevan: ICRA 2002) § Hierarchical MDP solutions (using the options framework [Sutton, Precup, Beliefs: b(s) Singh, AIJ]) Abstract actions Primitive actions 0. 35 0. 3 0. 2 0. 1 § Heuristic POMDP solutions 0. 05 § MLS 4, 10 p(b)= go-west v(go-west) 10, 5 23, 100 49, 20 100, 40 v(go-east) 24
Plan Execution 25
Plan Execution 26
Plan Execution 27
Plan Execution 28
Intuition § Probability distribution at the higher level evolves more slowly § The agent does not decide what the best macro-action to do every time step § Long term actions result in robot localization 29
F-MLS Demo 30
H-MLS Demo 31
Hierarchical is More Successful Unknown initial position Success % Environment MLS QMDP Algorithm 32
Hierarchical Takes Less Time to Reach Goal Unknown initial position ? Average Steps to Goal Environment MLS QMDP Algorithm 33
Hierarchical Plans are Computed Faster Planning Time Environment Goal 1 Goal 2 Algorithm 34
Outline § Sequential decision-making under uncertainty § A Hierarchical POMDP model for robot navigation § Heuristic macro-action selection in H-POMDPs § Near Optimal macro-action selection for arbitrary POMDPs § Representing H-POMDPs as DBNs § Current and Future directions 35
Near Optimal Macro-action Selection (Theocharous, Kaelbling NIPS 2003) §Usually agents don’t require the entire belief space §Macro-actions can reduce belief space even more §Tested in large scale robot navigation §Only small part of the belief-space is required §Learn approximate POMDP policies fast §High success rate §Better policies §Does information gathering 36
Dynamic Grids Given a resolution, points are sampled dynamically from regular dicretizations, by simulating trajectories 37
The Algorithm True trajectory True belief state Resulting next true belief state Simulation trajectories from g of macro A (estimation of value at g) Value of b’’ is interpolated from it’s neighbors Nearest grid point to b 38
Experimental Setup 39
Fewer Number of States 40
Fewer Steps to Goal 41
More Successful 42
Information Gathering 43
Information Gathering (scaling up) 44
Dynamic POMDP Abstractions En tr op y th re sh o ld s (Theocharous, Mannor, Kaelbling) t r sta go a l on ati liz a c Lo os r c a m 45
Fewer Steps to Goal 46
Outline § Sequential decision-making under uncertainty § A Hierarchical POMDP model for robot navigation § Heuristic macro-action selection in H-POMDPs § Near Optimal macro-action selection for arbitrary POMDPs § Representing H-POMDPs as DBNs § Current and Future directions 47
Dynamic Bayesian Networks STATE POMDP FACTORED DBN POMDP 0. 08 0. 01 0. 05 0. 01 0. 7 0. 08 # of parameters 48
DBN Inference L 1 49
Representing H-POMDPs as Dynamic Bayesian Networks (Theocharous, Murphy, Kaelbling: ICRA 2004) WEST EAST STATE H-POMDP WEST EAST FACTORED DBN H-POMDP 50
Representing H-POMDPs as Dynamic Bayesian Networks (Theocharous, Murphy, Kaelbling: ICRA 2004) WEST EAST STATE H-POMDP WEST EAST FACTORED DBN H-POMDP 51
Representing H-POMDPs as Dynamic Bayesian Networks (Theocharous, Murphy, Kaelbling: ICRA 2004) WEST EAST STATE H-POMDP WEST EAST FACTORED DBN H-POMDP 52
Representing H-POMDPs as Dynamic Bayesian Networks (Theocharous, Murphy, Kaelbling: ICRA 2004) WEST EAST STATE H-POMDP WEST EAST FACTORED DBN H-POMDP 53
Representing H-POMDPs as Dynamic Bayesian Networks (Theocharous, Murphy, Kaelbling: ICRA 2004) WEST EAST STATE H-POMDP WEST EAST FACTORED DBN H-POMDP 54
Complexity of Inference FACTORED DBN H-POMDP STATE H-POMDP WEST EAST DBN H-POMDP STATE POMDP 55
Hierarchical Localizes better Original Factored DBN tied H-POMDP Factored DBN H-POMDP STATE POMDP Before training 56
Hierarchical Fits Data Better Original Factored DBN tied H-POMDP Factored DBN H-POMDP STATE POMDP Before training 57
Directions for Future Research § In the future we will explore structure learning § Bayesian model selection approaches § Methods for learning compositional hierarchies (recurrent nets, hierarchical sparse n-grams) § Natural language acquisition methods § Identifying isomorphic processes § On–line learning § Interactive Learning § Application to real world problems 58
Major Contributions § The H-POMDP model § Requires less training data § Provides better state estimation § Fast planning § Macro-actions in POMDPS reduce uncertainty § Information gathering § Application of the algorithms to large scale Robot navigation § Map Learning § Planning and execution 59
- Financial decision
- No decision snap decision responsible decision
- Georgios kellaris
- Georgios banos
- Georgios portokalidis
- Georgios amanatidis
- Georgios margetidis
- Georgios konstantinou
- Georgios stavropoulos
- Georgios piliouras
- Sequential decision analytics and modeling
- Sequential decision analytics
- Decision tree and decision table
- Hierarki perencanaan
- Relocation diffusion
- Variance of error term in regression
- Discipline of classifying and naming organisms
- Examples of reverse hierarchical diffusion
- Hierarchical production planning
- Unsupervised hierarchical clustering
- Task analysis example
- Designlan
- Shotgun sequencing
- Dbms.putline
- Graphic organizer examples
- Isa hierarchy in er diagram
- Linear order linguistics
- Strategic choices in designing internal structure
- Windowed hierarchical cooperative a*
- Complete link clustering
- The hierarchical arrangement of large social groups
- Which type of star cluster is loose and disorganized?
- Hierarchicaldatatemplate
- Reterritorialization definition ap human geography
- Hierarchical reinforcement learning: a comprehensive survey
- I
- Hierarchical bayesian model
- Hierarchical diffusion
- Set partitioning in hierarchical trees
- Hpd design guidelines
- Hierarchical modeling in computer graphics
- Flat clustering vs hierarchical clustering
- Pedro coelhoso
- Diagrama p&i
- Hierarchy
- Intrapage link
- Cluster data mining
- Cnettv
- Hierarchical design in verilog
- Example of hierarchical planning in an organization
- Hierarchical clustering
- Reverse hierarchical diffusion
- Hierarchical clustering demo
- Slidetodoc.com
- Hierarchical task analysis
- Dbscan hierarchical clustering
- Hierarchical condition categories
- What is hierarchical file system
- Hierarchical bayesian model
- Hierarchical object oriented design