Advanced Topics Data Science and AI Automated Planning

Content 1. Planning and Acting with Deterministic Models 5. Making Simple Decisions 2. Planning

Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as

Nondeterministic Planning Domains • For deterministic planning problems, search space was a graph •

Example • Very simple harbor management domain • Unload a single item from a

Example • One state variable: pos(item) • Five actions • Deterministic: • unload, back,

Performing a Policy • Perform. Policy(�� ) s ← observe current state while s

Planning Problems and Solutions • unload park deliver 14

Safe Solutions • move deliver unload park deliver move 16

Safe Solutions • back unload park deliver move 17

Kinds of Solutions safe solutions acyclic solutions unsafe solutions a c Goal b 18

Intermediate Summary • Planning Problems • Planning domains • Plans as policies • Planning

Finding (Unsafe) Solutions For comparison: Forward-search with deterministic models Forward-search(Σ, s 0, g) s

Find-Solution(Σ, s 0, Sg) s ← s 0 �� ← ∅ Visited ← {s

Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��.

Finding Acyclic Safe Solutions Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s

Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do

Finding Safe Solutions Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}

Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do

Intermediate Summary • And/Or Graph Search • Algorithms for each type of solution •

Guided-Find-Safe-Solution ⇐ not in the book 47

$Example foo s 0 = on_ship π = {} 48$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 49$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver). (parking 2, deliver)}$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver),$

$Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2,$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver),$

$Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2,$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move)} 55$

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking$

Determinisation • How to implement it? • Need implementation of Find-Solution • Need it

Determinisation • at_harbor parking 1 parking 2 transit 1 at_harbor park 3 park 1

Determinisation • Plan 2 policy(p=�a 1, . . . , an�, s) �� ←

Guided-Find-Safe-Solution Same as Guided-Find-Safe. Solution. Any classical planner that does not return cyclic plans.

Intermediate Summary • Determinisation Techniques • Guided-find-safe-solution • call find-solution to get an unsafe

Online Approaches • Motivation • Planning models are approximate – execution seldom works out

Online Approaches • Need to identify good actions without exploring entire search space •

Lookahead-Partial-Plan • Lookahead-Partial-Plan(Σ, s 0, Sg) s ← s 0 while s ∉ Sg

FS-Replan • FS-Replan(Σ, s, Sg) �� d ← ∅ while s ∉ Sg and

Possibilities for Lookahead • Lookahead could be one of the algorithms we discussed earlier

Possibilities for Lookahead (ct’d) • Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ←

Min-Max LRTA* Assumes each action has cost 1 Can easily be modified to use

Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s)

Intermediate Summary • Online approaches • Lookahead-partial-plan • Adaptation of Run-Lazy-Lookahead Can also adapt

Slides: 97

Download presentation

Advanced Topics Data Science and AI Automated Planning and Acting Nondeterministic Models Tanya Braun

Content 1. Planning and Acting with Deterministic Models 5. Making Simple Decisions 2. Planning and Acting with 6. Making Complex Refinement Methods Decisions 3. Planning and Acting with 7. Planning and Acting with Temporal Models Probabilistic Models 4. Planning and Acting with 8. Provably Beneficial AI Nondeterministic Models • Other: open world, a. Planning Problem perceiving, learning b. And/Or Graph Search c. Determinisation d. Online Approaches • If time permits 2

Motivation • c a b grasp(c) a b c 3

Outline per the Book 5. 2 Planning Problem • Planning domains • Plans as policies • Planning problems and solutions 5. 3 And/Or Graph Search • Planning by forward search 5. 5 Determinisation Techniques • Guided planning for safe solutions • Planning for safe solutions by determinisation 5. 6 Online Approaches • Lookahead by Determinisation • Lookahead with a bounded number of steps 4

Nondeterministic Planning Domains • 5

Nondeterministic Planning Domains • For deterministic planning problems, search space was a graph • Now it’s an AND/OR graph • OR branch: • Several applicable actions, which one to choose? • AND branch: • Multiple possible outcomes • Must handle all of them • Analogy to PSP • OR branch ⇔ action selection • AND branch ⇔ flaw selection 6

Example • Very simple harbor management domain • Unload a single item from a ship • Move it around a harbor 7

Example • One state variable: pos(item) • Five actions • Deterministic: • unload, back, (move in one state) • Nondeterministic: • park, move, deliver • Simplified names for states • For {pos(item)=on_ship} write on_ship 8

Actions • 9

Plans Policies • unload park deliver 10

Definitions Over Policies • 11

Definitions Over Policies • 12

Performing a Policy • Perform. Policy(�� ) s ← observe current state while s ∈ Dom(�� ) do perform action �� (s) s ← observe current state unload park deliver 13

Planning Problems and Solutions • unload park deliver 14

Safe Solutions • unload park deliver 15

Safe Solutions • move deliver unload park deliver move 16

Safe Solutions • back unload park deliver move 17

Kinds of Solutions safe solutions acyclic solutions unsafe solutions a c Goal b 18

Intermediate Summary • Planning Problems • Planning domains • Plans as policies • Planning problems and solutions • Types of solutions: safe, unsafe, acyclic, cyclic 19

Finding (Unsafe) Solutions For comparison: Forward-search with deterministic models Forward-search(Σ, s 0, g) s ← s 0 �� ← �� loop if s satisfies g then return �� A′ ←{a ∈ A | a is applicable in s} if A′ = ∅ then return failure nondeterministically choose a ∈ A′ s ← �� (s, a) �� ← ��. a Find-Solution(Σ, s 0, Sg) s ← s 0 �� ← ∅ Visited ← {s 0} loop if s ∈ Sg then return �� A′ ← Applicable(s) if A′ = ∅ then return failure nondeterministically choose a ∈ A′ nondeterministically choose s’ ∈ �� (s, a) if s’ ∈ Visited then return failure �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ Decide which state to plan for Cycle-checking 21

Find-Solution(Σ, s 0, Sg) s ← s 0 �� ← ∅ Visited ← {s 0}. . . Example s = on_ship π = {} s Visited = {on_ship} 22

Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = on_ship, a = unload γ(s, a) = {at_harbor} s′ = at_harbor π = {(on_ship, unload)} s Example a unload s' Visited = {on_ship, at_harbor} 23

Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = at_harbor, a = park γ(s, a) = {parking 1, parking 2, transit 1} s′ = parking 1 a unload π = {(on_ship, unload), (at_harbor, park)} Example park s s' Visited = {on_ship, at_harbor, parking 1} 24

Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ s = parking 1, a = deliver γ(s, a) = {gate 1, gate 2, transit 1} s′ = gate 1 Example s' unload a park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} s Visited = {on_ship, at_harbor, parking 1, gate 1} 25

Find-Solution(Σ, s 0, Sg). . . loop if s ∈ Sg then return ��. . . nondeterministically choose a ∈ Applicable(s) nondeterministically choose s’ ∈ �� (s, a). . . �� (s) ← a Visited ← Visited ∪ {s’} s ← s’ Example s = gate 1 s Gate 1 is a goal, so return π unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} Visited = {on_ship, at_harbor, parking 1, gate 1} 26

Finding Acyclic Safe Solutions Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} if Applicable(s) = ∅ then return failure nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Keep track of unexpanded states, like A* Cycle-checking • 27

Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example Frontier ∖ Sg = {on_ship} π = {} 28

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = at_harbor Frontier ∖ Sg = {parking 1, parking 2, transit 1} unload park π = {(on_ship, unload), (at_harbor, park)} 30

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = parking 1 Frontier ∖ Sg = {parking 2, transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 31

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose back or deliver nondeterministically choose a ∈ • back ⇒ cycle, so return failure Applicable(s) • deliver ⇒ no cycle, so continue �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = parking 2 deliver Frontier ∖ Sg = {transit 1, transit 2, transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver)} 32

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 1 deliver Frontier ∖ Sg = {transit 2, transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move)} move 33

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 2 deliver Frontier ∖ Sg = {transit 3} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move), (transit 2, move)} move 34

Find-Acyclic-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� Example s = transit 3 move deliver Frontier ∖ Sg = ∅ Found a solution, so return π π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 1, move), (transit 2, move), (transit 3, move)} unload park deliver move 35

Finding Safe Solutions Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} if Applicable(s) = ∅ then return failure nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Different cycle-checking • 36

Find-Safe-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0}. . . Example Frontier ∖ Sg = {on_ship} π = {} 37

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = on_ship Frontier ∖ Sg = {at_harbor} unload π = {(on_ship, unload)} 38

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = at_harbor Frontier ∖ Sg = {parking 1, parking 2, transit 1} unload park π = {(on_ship, unload), (at_harbor, park)} 39

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example s = parking 1 Frontier ∖ Sg = {parking 2, transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 40

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose back or deliver nondeterministically choose a ∈ • back is okay: escapable cycle Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = parking 2 Frontier ∖ Sg = {transit 1, transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back)} 41

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = transit 1 Frontier ∖ Sg = {transit 2} unload park deliver π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back), (transit 1, move)} move 42

Find-Safe-Solution(Σ, s 0, Sg). . . for every s ∈ Frontier Sg do Frontier ← Frontier {s}. . . nondeterministically choose a ∈ Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-unsafe-loops(�� , s, Frontier) then return failure return �� Example back s = transit 2 Frontier ∖ Sg = ∅ Found a solution, so return π π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, back), (transit 1, move), (transit 2, move)} unload park deliver move 43

Intermediate Summary • And/Or Graph Search • Algorithms for each type of solution • unsafe, cyclic safe, acyclic safe 44

Guided-Find-Safe-Solution • 46

Guided-Find-Safe-Solution ⇐ not in the book 47

$Example foo s 0 = on_ship π = {} 48$

Example foo s 0 = on_ship π = {} 48

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 49$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver)} 49

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver). (parking 2, deliver)}$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver). (parking 2, deliver)} 50

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver),$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 51

$Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2,$

Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 52

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver),$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (transit 3, move), (foo, move)} 53

$Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2,$

Example fail foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (parking 2, deliver), (foo, move)} 54

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move)} 55$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move)} 55

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back)} 56

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move)} 57

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move), (transit 2, move)} 58

$Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking$

Example foo π = {(on_ship, unload), (at_harbor, park), (parking 1, deliver), (foo, move), (parking 2, back), (transit 1, move), (transit 2, move)} 59

Determinisation • How to implement it? • Need implementation of Find-Solution • Need it to be very efficient • Called many times • Idea: instead, use a classical planner • Any algorithm from Chapter 2 • Efficient algorithms, search heuristics • For that, determinise actions 60

Determinisation • at_harbor parking 1 parking 2 transit 1 at_harbor park 3 park 1 park 2 parking 1 transit 1 parking 2 61

Determinisation • Plan 2 policy(p=�a 1, . . . , an�, s) �� ← ∅ for i from 1 to n do �� ← �� ∪ {s, det 2 nondet(ai)} s ← �� d(s, ai) return π 62

Guided-Find-Safe-Solution Same as Guided-Find-Safe. Solution. Any classical planner that does not return cyclic plans. 63

Example foo 64

Example foo 65

Example foo 66

Example foo 67

Example foo 68

Example foo 69

Example fail foo 70

Example foo 71

Example fail foo 72

Example foo 73

Example foo 74

Example foo 75

Example foo 76

Example foo 77

Example foo 78

Example foo 79

Example foo 80

Making Actions Inapplicable • 81

Intermediate Summary • Determinisation Techniques • Guided-find-safe-solution • call find-solution to get an unsafe solution • call find-solution additional times on the leaves • Find-safe-solution-by-determinization • use determinized actions • call classical planner rather than find-solution • if dead-ends are encountered, modify actions that lead to them 82

Online Approaches • Motivation • Planning models are approximate – execution seldom works out as planned • Large problems may require too much planning time • 2 nd motivation even more stronger in nondeterministic domains • Nondeterminism makes planning exponentially harder • Exponentially more time, exponentially larger policies Offline vs Runtime Search Spaces 84

Online Approaches • Need to identify good actions without exploring entire search space • Can be done using heuristic estimates • Some domains are safely explorable • Safe to create partial plans, because goal states are reachable from all situations • Other domains contain dead-ends, partial planning will not guarantee success • Can get trapped in dead ends that we would have detected if we had planned fully • No applicable actions • Robot goes down a steep incline and can’t come back up • Applicable actions, but caught in a loop • Robot goes into a collection of rooms from which there’s no exit • However, partial planning can still make success more likely 85

Lookahead-Partial-Plan • Lookahead-Partial-Plan(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do �� ← Lookahead(s, �� ) if �� = ∅ then return failure else perform partial plan �� s ← observe current state 86

FS-Replan • FS-Replan(Σ, s, Sg) �� d ← ∅ while s ∉ Sg and Applicable(s) ≠ ∅ do if �� d undefined for s then �� d ← Plan 2 policy(Forward-search(Σd, s, Sg), s) if �� d = failure then return failure perform action �� d(s) s ← observe resulting state Generalised-FS-Replan(Σ, s, Sg) �� d ← ∅ while s ∉ Sg and Applicable(s) ≠ ∅ do if �� d undefined for s then �� ) d ← Lookahead(s, �� if �� d = failure then return failure perform action �� d(s) s ← observe resulting state 87

Possibilities for Lookahead • Lookahead could be one of the algorithms we discussed earlier • • Find-Safe-Solution Find-Acyclic-Solution Guided-Find-Safe-Solution-by-Determinization • What if it doesn’t have time to run to completion? Planning stage Acting stage • Can use the same techniques, we discussed in Chapter 3 • • Receding horizon Sampling Subgoaling Iterative Deepening 88

Possibilities for Lookahead (ct’d) • Find-Acyclic-Solution(Σ, s 0, Sg) �� ← ∅ Frontier ← {s 0} for every s ∈ Frontier Sg do Frontier ← Frontier {s} T ← i elements if Applicable(s) = ∅ then return failure of �� (s, a) Dom(�� ) nondeterministically choose a ∈ Frontier ← Frontier ∪ T Applicable(s) �� ← �� ∪ (s, a) Frontier ← Frontier ∪ (�� (s, a) Dom(�� )) if has-loops(�� , s, Frontier) then return failure return �� 89

Safely Explorable Domains • 90

Min-Max LRTA* Assumes each action has cost 1 Can easily be modified to use cost ≠ 1 • Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state 91

Min-Max-LRTA*(Σ, s 0, Sg) s ← s 0 while s ∉ Sg and Applicable(s) ≠ ∅ do a ← argmina∈Applicable(s) maxs’∈�� (s, a) h(s’) h(s) ← max{h(s), 1 + maxs’∈�� (s, a) h(s’)} perform action a s ← the current state Example h=0 92

Intermediate Summary • Online approaches • Lookahead-partial-plan • Adaptation of Run-Lazy-Lookahead Can also adapt Run-Concurrent-Lookahead • FS-replan • Adaptation of Run-Lookahead • Ways to do the lookahead • Full breadth with limited depth • iterative deepening • Full depth with limited breadth Can put bounds on both depth and breadth • iterative broadening • Convergence in safely explorable domains • Min-Max-LRTA* 96