Advanced Topics Data Science and AI Automated Planning

































































































- Slides: 97
Advanced Topics Data Science and AI Automated Planning and Acting Nondeterministic Models Tanya Braun
Content 1. Planning and Acting with Deterministic Models 5. Making Simple Decisions 2. Planning and Acting with 6. Making Complex Refinement Methods Decisions 3. Planning and Acting with 7. Planning and Acting with Temporal Models Probabilistic Models 4. Planning and Acting with 8. Provably Beneficial AI Nondeterministic Models • Other: open world, a. Planning Problem perceiving, learning b. And/Or Graph Search c. Determinisation d. Online Approaches • If time permits 2
Motivation • c a b grasp(c) a b c 3
Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps 4
Nondeterministic Planning Domains • 5
Nondeterministic Planning Domains • For deterministic planning problems, search space was a graph • Now it’s an AND/OR graph • OR branch: • Several applicable actions, which one to choose? • AND branch: • Multiple possible outcomes • Must handle all of them • Analogy to PSP • OR branch ⇔ action selection • AND branch ⇔ flaw selection 6
Example • Very simple harbor management domain • Unload a single item from a ship • Move it around a harbor 7
Example • One state variable: pos(item) • Five actions • Deterministic: • unload, back, (move in one state) • Nondeterministic: • park, move, deliver • Simplified names for states • For {pos(item)=on_ship} write on_ship 8
Actions • 9
Plans Policies • unload park deliver 10
Definitions Over Policies • 11
Definitions Over Policies • 12
Performing a Policy • Perform. Policy(�� ) s ← observe current state while s ∈ Dom(�� ) do perform action �� (s) s ← observe current state unload park deliver 13
Planning Problems and Solutions • unload park deliver 14
Safe Solutions • unload park deliver 15
Safe Solutions • move deliver unload park deliver move 16
Safe Solutions • back unload park deliver move 17
Kinds of Solutions safe solutions acyclic solutions unsafe solutions a c Goal b 18
Intermediate Summary • Planning Problems • Planning domains • Plans as policies • Planning problems and solutions • Types of solutions: safe, unsafe, acyclic, cyclic 19
Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps 20
Finding (Unsafe) Solutions For comparison: Forward-search with deterministic models Forward-search(Σ, s 0, g) s ← s 0 �� ← �� loop if s satisfies g then return �� A′ ←{a ∈ A | a is applicable in s} if A′ = ∅ then return failure nondeterministically choose a ∈ A′ s ← �� (s, a) �� ← ��. a Find-Solution(Σ, s 0, Sg) s ← s 0 �� ← ∅ Visited ← {s 0} loop if s ∈ Sg then return �� A′ ← Applicable(s) if A′ = ∅ then return failure nondeterministically choose a ∈ A′ nondeterministically choose s’ ∈ �� (s, a) if s’ ∈ Visited then return failure �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ Decide which state to plan for Cycle-checking 21
Find-Solution(Σ, s 0, Sg) s ← s 0 �� ← ∅ Visited ← {s 0}. . . Example s = on_ship π = {} s Visited = {on_ship} 22
Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = on_ship, a = unload γ(s, a) = {at_harbor} s′ = at_harbor π = {(on_ship, unload)} s Example a unload s' Visited = {on_ship, at_harbor} 23
Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = at_harbor, a = park γ(s, a) = {parking 1, parking 2, transit 1} s′ = parking 1 a unload π = {(on_ship, unload), (at_harbor, park)} Example park s s' Visited = {on_ship, at_harbor, parking 1} 24
Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = parking 1, a = deliver γ(s, a) = {gate 1, gate 2, transit 1} s′ = gate 1 Example s' unload a park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} s Visited = {on_ship, at_harbor, parking 1, gate 1} 25
Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ Example s = gate 1 s Gate 1 is a goal, so return π unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} Visited = {on_ship, at_harbor, parking 1, gate 1} 26
Finding Acyclic Safe Solutions Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} if Applicable(s) = ∅ then return failure nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Keep track of unexpanded states, like A* Cycle-checking • 27
Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example Frontier ∖ Sg = {on_ship} π = {} 28
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = on_ship Frontier ∖ Sg = {at_harbor} unload π = {(on_ship, unload)} 29
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = at_harbor Frontier ∖ Sg = {parking 1, parking 2, transit 1} unload park π = {(on_ship, unload), (at_harbor, park)} 30
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = parking 1 Frontier ∖ Sg = {parking 2, transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 31
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose back or deliver nondeterministically choose a ∈ • back ⇒ cycle, so return failure Applicable(s) • deliver ⇒ no cycle, so continue �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = parking 2 deliver Frontier ∖ Sg = {transit 1, transit 2, transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver)} 32
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 1 deliver Frontier ∖ Sg = {transit 2, transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move)} move 33
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 2 deliver Frontier ∖ Sg = {transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move), (transit 2, move)} move 34
Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 3 move deliver Frontier ∖ Sg = ∅ Found a solution, so return π π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move), (transit 2, move), (transit 3, move)} unload park deliver move 35
Finding Safe Solutions Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} if Applicable(s) = ∅ then return failure nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Different cycle-checking • 36
Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example Frontier ∖ Sg = {on_ship} π = {} 37
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = on_ship Frontier ∖ Sg = {at_harbor} unload π = {(on_ship, unload)} 38
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = at_harbor Frontier ∖ Sg = {parking 1, parking 2, transit 1} unload park π = {(on_ship, unload), (at_harbor, park)} 39
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = parking 1 Frontier ∖ Sg = {parking 2, transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 40
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose back or deliver nondeterministically choose a ∈ • back is okay: escapable cycle Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = parking 2 Frontier ∖ Sg = {transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back)} 41
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = transit 1 Frontier ∖ Sg = {transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back), (transit 1, move)} move 42
Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = transit 2 Frontier ∖ Sg = ∅ Found a solution, so return π π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back), (transit 1, move), (transit 2, move)} unload park deliver move 43
Intermediate Summary • And/Or Graph Search • Algorithms for each type of solution • unsafe, cyclic safe, acyclic safe 44
Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps 45
Guided-Find-Safe-Solution • 46
Guided-Find-Safe-Solution ⇐ not in the book 47
Example foo s 0 = on_ship π = {} 48
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 49
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver). (parking 2, deliver)} 50
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 51
Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 52
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 53
Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (foo, move)} 54
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move)} 55
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back)} 56
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move)} 57
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move), (transit 2, move)} 58
Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move), (transit 2, move)} 59
Determinisation • How to implement it? • Need implementation of Find-Solution • Need it to be very efficient • Called many times • Idea: instead, use a classical planner • Any algorithm from Chapter 2 • Efficient algorithms, search heuristics • For that, determinise actions 60
Determinisation • at_harbor parking 1 parking 2 transit 1 at_harbor park 3 park 1 park 2 parking 1 transit 1 parking 2 61
Determinisation • Plan 2 policy(p=�a 1, . . . , an�, s) �� ← ∅ for i from 1 to n do �� ← �� ∪ {s, det 2 nondet(ai)} s ← �� d(s, ai) return π 62
Guided-Find-Safe-Solution Same as Guided-Find-Safe. Solution. Any classical planner that does not return cyclic plans. 63
Example foo 64
Example foo 65
Example foo 66
Example foo 67
Example foo 68
Example foo 69
Example fail foo 70
Example foo 71
Example fail foo 72
Example foo 73
Example foo 74
Example foo 75
Example foo 76
Example foo 77
Example foo 78
Example foo 79
Example foo 80
Making Actions Inapplicable • 81
Intermediate Summary • Determinisation Techniques • Guided-find-safe-solution • call find-solution to get an unsafe solution • call find-solution additional times on the leaves • Find-safe-solution-by-determinization • use determinized actions • call classical planner rather than find-solution • if dead-ends are encountered, modify actions that lead to them 82
Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps 83
Online Approaches • Motivation • Planning models are approximate – execution seldom works out as planned • Large problems may require too much planning time • 2 nd motivation even more stronger in nondeterministic domains • Nondeterminism makes planning exponentially harder • Exponentially more time, exponentially larger policies Offline vs Runtime Search Spaces 84
Online Approaches • Need to identify good actions without exploring entire search space • Can be done using heuristic estimates • Some domains are safely explorable • Safe to create partial plans, because goal states are reachable from all situations • Other domains contain dead-ends, partial planning will not guarantee success • Can get trapped in dead ends that we would have detected if we had planned fully • No applicable actions • Robot goes down a steep incline and can’t come back up • Applicable actions, but caught in a loop • Robot goes into a collection of rooms from which there’s no exit • However, partial planning can still make success more likely 85
Lookahead-Partial-Plan • Lookahead-Partial-Plan(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do �� ← Lookahead(s, �� ) if �� = ∅ then return failure else perform partial plan �� s ← observe current state 86
FS-Replan • FS-Replan(Σ, s, Sg) �� d ← ∅ while s ∉ Sg and Applicable(s) ≠ ∅ do if �� d undefined for s then �� d ← Plan 2 policy(Forward-search(Σd, s, Sg), s) if �� d = failure then return failure perform action �� d(s) s ← observe resulting state Generalised-FS-Replan(Σ, s, Sg) �� d ← ∅ while s ∉ Sg and Applicable(s) ≠ ∅ do if �� d undefined for s then �� ) d ← Lookahead(s, �� if �� d = failure then return failure perform action �� d(s) s ← observe resulting state 87
Possibilities for Lookahead • Lookahead could be one of the algorithms we discussed earlier • • Find-Safe-Solution Find-Acyclic-Solution Guided-Find-Safe-Solution-by-Determinization • What if it doesn’t have time to run to completion? Planning stage Acting stage • Can use the same techniques, we discussed in Chapter 3 • • Receding horizon Sampling Subgoaling Iterative Deepening 88
Possibilities for Lookahead (ct’d) • Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} T ← i elements if Applicable(s) = ∅ then return failure of �� (s, a) Dom(�� ) nondeterministically choose a ∈ Frontier ← Frontier ∪ T Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� 89
Safely Explorable Domains • 90
Min-Max LRTA* Assumes each action has cost 1 Can easily be modified to use cost ≠ 1 • Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state 91
Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state Example h=0 92
Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state Example a = unload h=1 h=0 93
Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state Example h=0 a = unload h=1 h=0 a = park h = 1+max(0, 0, 0) =1 h=0 94
Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state Example h=0 a = deliver h=1 1 + max(1) = 2 1 + max(0, 0) = 1 a = unload h=0 h=1 h=0 a = park h = 1+max(0, 0, 0) =1 h=0 95
Intermediate Summary • Online approaches • Lookahead-partial-plan • Adaptation of Run-Lazy-Lookahead Can also adapt Run-Concurrent-Lookahead • FS-replan • Adaptation of Run-Lookahead • Ways to do the lookahead • Full breadth with limited depth • iterative deepening • Full depth with limited breadth Can put bounds on both depth and breadth • iterative broadening • Convergence in safely explorable domains • Min-Max-LRTA* 96
Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps �Next: Making Simple Decisions 97