Hierarchical POMDP Planning and Execution Joelle Pineau Machine

  • Slides: 15
Download presentation
Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Partially Observable MDP u POMDPs are characterized by: States: s S l Actions: a

Partially Observable MDP u POMDPs are characterized by: States: s S l Actions: a A l Observations: o O l Transition probabilities: T(s, a, s’)=Pr(s’|s, a) l Observation probabilities: T(o, a, s’)=Pr(o|s, a) l Rewards: R(s, a) l l Beliefs: b(st)=Pr(st|ot, at, …, o 0, a 0) Hierarchical POMDP Planning and Execution S 1 S 2 S 3 2

The problem u u How can we find good policies for complex POMDPs? Is

The problem u u How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies? Hierarchical POMDP Planning and Execution 3

Proposed Approach u Exploit structure in the problem domain. u What type of structure?

Proposed Approach u Exploit structure in the problem domain. u What type of structure? l Action set partitioning Act Investigate. Health Move Check. Pulse Check. Meds Navigate Ask. Where Left Right Up Down Hierarchical POMDP Planning and Execution 4

Hierarchical POMDP Planning u What do we start with? A full POMDP model: {So,

Hierarchical POMDP Planning u What do we start with? A full POMDP model: {So, Ao, Oo, Mo}. l An action set partitioning graph. l u Key idea: Break the problem into many “related” POMDPs. l Each smaller POMDP has only a subset of Ao. Þ imposing policy constraint l u But why? l POMDP: exponential run-time per value iteration O(|A| n-1|O|) Hierarchical POMDP Planning and Execution 5

Example POMDP: Value Function: 0. 8 M Kitchen. State Meds. State 0. 1 Check.

Example POMDP: Value Function: 0. 8 M Kitchen. State Meds. State 0. 1 Check. Meds K 0. 8 Go. To. Kitchen Go. To. Bedroom 0. 1 E 0. 1 Bedroom. State 0. 1 B 0. 1 0. 8 Clarify. Task So= {Meds, Kitchen, Bedroom} Ao = {Clarify. Task, Check. Meds, Go. To. Kitchen, Go. To. Bedroom} Oo = {Noise, Meds, Kitchen, Bedroom} Hierarchical POMDP Planning and Execution 6

Hierarchical POMDP Action Partitioning: Act Clarify. Task Move Go. To. Kitchen Hierarchical POMDP Planning

Hierarchical POMDP Action Partitioning: Act Clarify. Task Move Go. To. Kitchen Hierarchical POMDP Planning and Execution Check. Meds Go. To. Bedroom 7

Local Value Function and Policy - Move Controller Kitchen. State Meds. State Go. To.

Local Value Function and Policy - Move Controller Kitchen. State Meds. State Go. To. Kitchen Bedroom. State Go. To. Bedroom Clarify. Task Hierarchical POMDP Planning and Execution 8

Modeling Abstract Actions Problem: Need parameters for abstract action Move Solution: Use the local

Modeling Abstract Actions Problem: Need parameters for abstract action Move Solution: Use the local policy of corresponding low-level controller General form: Pr ( sj | si, akabstract ) = Pr ( sj | si, Policy(akabstract, si) ) Example: Pr ( sj | Meds. State, Move ) = Pr ( sj | Meds. State, Clarify. Task ) Policy (Move, si): Kitchen. State Meds. State Bedroom. State Go. To. Kitchen Go. To. Bedroom Clarify. Task 9

Local Value Function and Policy - Act Controller Kitchen. State Meds. State Bedroom. State

Local Value Function and Policy - Act Controller Kitchen. State Meds. State Bedroom. State Check. Meds Move Hierarchical POMDP Planning and Execution 10

Comparing Policies Hierarchical Policy: = Clarify. Task = Check. Meds Hierarchical POMDP Planning and

Comparing Policies Hierarchical Policy: = Clarify. Task = Check. Meds Hierarchical POMDP Planning and Execution Optimal Policy: = Go. To. Kitchen = Go. To. Bedroom 11

Bounding the value of the approximation u Value function of top-level controller is an

Bounding the value of the approximation u Value function of top-level controller is an upperbound on the value of the approximation. l u Why? We were optimistic when modeling the abstract action. Similarly, we can find a lower-bound. l How? We can assume “worst-case” view when modeling the abstract action. èIf we partition the action set differently, we will get different bounds. Hierarchical POMDP Planning and Execution 12

A real dialogue management example - Say. Time Act Check. Health Greet Move Check.

A real dialogue management example - Say. Time Act Check. Health Greet Move Check. Weather Phone - Greet. General - Greet. Morning - Greet. Night - Respond. Thanks - Ask. Go. Where - Go. To. Room - Go. To. Kitchen - Go. To. Follow - Verify. Room - Verify. Kitchen - Verify. Follow - Ask. Weather. Time - Say. Current - Say. Today - Say. Tomorrow Hierarchical POMDP Planning and Execution - Ask. Health - Offer. Help Do. Meds - Start. Meds - Next. Meds - Force. Meds - Quit. Meds - Ask. Call. Who - Call. Help - Call. Nurse - Call. Relative - Verify. Help - Verify. Nurse - Verify. Relative 13

Results: Hierarchical POMDP Planning and Execution 14

Results: Hierarchical POMDP Planning and Execution 14

Final words u We presented: l u a general framework to exploit structure in

Final words u We presented: l u a general framework to exploit structure in POMDPs; Future work: automatic generation of good action partitioning; l conditions for additional observation abstraction; l bigger problems! l Hierarchical POMDP Planning and Execution 15