Hierarchical POMDP Planning and Execution Joelle Pineau Machine















- Slides: 15

Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Partially Observable MDP u POMDPs are characterized by: States: s S l Actions: a A l Observations: o O l Transition probabilities: T(s, a, s’)=Pr(s’|s, a) l Observation probabilities: T(o, a, s’)=Pr(o|s, a) l Rewards: R(s, a) l l Beliefs: b(st)=Pr(st|ot, at, …, o 0, a 0) Hierarchical POMDP Planning and Execution S 1 S 2 S 3 2

The problem u u How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies? Hierarchical POMDP Planning and Execution 3

Proposed Approach u Exploit structure in the problem domain. u What type of structure? l Action set partitioning Act Investigate. Health Move Check. Pulse Check. Meds Navigate Ask. Where Left Right Up Down Hierarchical POMDP Planning and Execution 4

Hierarchical POMDP Planning u What do we start with? A full POMDP model: {So, Ao, Oo, Mo}. l An action set partitioning graph. l u Key idea: Break the problem into many “related” POMDPs. l Each smaller POMDP has only a subset of Ao. Þ imposing policy constraint l u But why? l POMDP: exponential run-time per value iteration O(|A| n-1|O|) Hierarchical POMDP Planning and Execution 5

Example POMDP: Value Function: 0. 8 M Kitchen. State Meds. State 0. 1 Check. Meds K 0. 8 Go. To. Kitchen Go. To. Bedroom 0. 1 E 0. 1 Bedroom. State 0. 1 B 0. 1 0. 8 Clarify. Task So= {Meds, Kitchen, Bedroom} Ao = {Clarify. Task, Check. Meds, Go. To. Kitchen, Go. To. Bedroom} Oo = {Noise, Meds, Kitchen, Bedroom} Hierarchical POMDP Planning and Execution 6

Hierarchical POMDP Action Partitioning: Act Clarify. Task Move Go. To. Kitchen Hierarchical POMDP Planning and Execution Check. Meds Go. To. Bedroom 7

Local Value Function and Policy - Move Controller Kitchen. State Meds. State Go. To. Kitchen Bedroom. State Go. To. Bedroom Clarify. Task Hierarchical POMDP Planning and Execution 8

Modeling Abstract Actions Problem: Need parameters for abstract action Move Solution: Use the local policy of corresponding low-level controller General form: Pr ( sj | si, akabstract ) = Pr ( sj | si, Policy(akabstract, si) ) Example: Pr ( sj | Meds. State, Move ) = Pr ( sj | Meds. State, Clarify. Task ) Policy (Move, si): Kitchen. State Meds. State Bedroom. State Go. To. Kitchen Go. To. Bedroom Clarify. Task 9

Local Value Function and Policy - Act Controller Kitchen. State Meds. State Bedroom. State Check. Meds Move Hierarchical POMDP Planning and Execution 10

Comparing Policies Hierarchical Policy: = Clarify. Task = Check. Meds Hierarchical POMDP Planning and Execution Optimal Policy: = Go. To. Kitchen = Go. To. Bedroom 11

Bounding the value of the approximation u Value function of top-level controller is an upperbound on the value of the approximation. l u Why? We were optimistic when modeling the abstract action. Similarly, we can find a lower-bound. l How? We can assume “worst-case” view when modeling the abstract action. èIf we partition the action set differently, we will get different bounds. Hierarchical POMDP Planning and Execution 12

A real dialogue management example - Say. Time Act Check. Health Greet Move Check. Weather Phone - Greet. General - Greet. Morning - Greet. Night - Respond. Thanks - Ask. Go. Where - Go. To. Room - Go. To. Kitchen - Go. To. Follow - Verify. Room - Verify. Kitchen - Verify. Follow - Ask. Weather. Time - Say. Current - Say. Today - Say. Tomorrow Hierarchical POMDP Planning and Execution - Ask. Health - Offer. Help Do. Meds - Start. Meds - Next. Meds - Force. Meds - Quit. Meds - Ask. Call. Who - Call. Help - Call. Nurse - Call. Relative - Verify. Help - Verify. Nurse - Verify. Relative 13

Results: Hierarchical POMDP Planning and Execution 14

Final words u We presented: l u a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; l conditions for additional observation abstraction; l bigger problems! l Hierarchical POMDP Planning and Execution 15