FFReplan A Baseline for Probabilistic Planning Sungwook Yoon

FF-Replan: A Baseline for Probabilistic Planning Sungwook Yoon Alan fern Robert Givan FF-Replan : Sungwook Yoon

Replanning Approach • Deterministic Planner for Probabilistic Planning? • Winner of IPPC-2004 and (unofficial) winner of IPPC-2006 • Why was it conceived? • Why it worked? – Domain by domain analysis • Any extension? FF-Replan : Sungwook Yoon

IPPC-2004 Pre-released Domains Blocksworld Boxworld FF-Replan : Sungwook Yoon

IPPC Performance Test -Client Server Interaction -The problem definition is known apriori -Performance is recorded in the server log -For one problem, 30 repetitive test is conducted FF-Replan : Sungwook Yoon

Single Outcome Replanning (FFRs) • Natural approach given the competition setting and the domains, Intro to AI (Russell and Norvig) – Hash state-action mapping • Replace probabilistic effects with deterministic effect • Ground Goal Probability 1 Action Probability 2 Probability 3 Effect 1 Effect 2 A B Action C Effect 3 FF-Replan : Sungwook Yoon Effect 2

IPPC-2004 Domains Blocksworld Boxworld Fileworld Tireworld Tower of Hanoi Zeno. Travel Exploding Blocksworld FF-Replan : Sungwook Yoon

IPPC-2004 Results Human Learned Control Knowledge Numbers : Successful Runs 2 nd Place Winners NMRC J 1 Classy NMR m. GPT C FFRS FFRA BW 252 270 255 30 120 30 210 270 Box 134 150 100 0 30 0 150 File - - - 3 30 3 14 29 Zeno - - - 30 30 30 Tire-r - - - 30 30 30 Tire-g - - - 9 16 30 7 7 TOH - - - 15 0 0 0 11 Exploding - - - 0 0 0 3 5 NMR Non-Markovian Reward Decision Process Planner Classy Approximate Policy Iteration with a Policy Language Bias m. GPT Heuristic Search Probabilistic Planning C Symbolic Heuristic Search

Reason of the Success • Determinization and efficient pre-processing of complex planning language – Input language is quite complex (PPDDL) – Classic planning has developed efficient preprocessing techniques on complex input language and scales well – Grounding goal also helped • Classic planning takes hard time dealing with lifted goals • The domains in the competition – 17 of 20 problems were dead-end free – Amenable to Replanning approach FF-Replan : Sungwook Yoon

All Outcome Replanning (FFRA) • Selecting one outcome is troublesome – Which outcome to take? – Let’s use all the outcomes – All we have to do is translating a deterministic action to the original probabilistic action during the server-client interaction with MDPSIM – Novel approach Probability 1 Action Probability 2 Probability 3 Effect 1 Action 1 Effect 2 Action 2 Effect 3 Action 3 Effect 3 FF-Replan : Sungwook Yoon

IPPC-2006 Domains • Blocksworld • Exploding Blocksworld • Zeno. Travel • Tireworld • Elevator • Drive • Pitch. Catch • Schedule • Randomly generate syntactically correct domain – E. g. , Don’t delete facts that are not in the precondition • Randomly generate a state – This is initial state • Take random walk from the state, using the random domain • The resulting state is a goal state – There is at least a path from the initial state to the goal state • If the probability of the path is bigger than α, then stop, otherwise take a random walk again • Special reset action is provided that take any state to the initial state FF-Replan : Sungwook Yoon

IPPC-2006 Results Numbers : Percentage of Successful Runs Paragraph FFRS FFRA FPG FOALP sf. DP BW 86 63 100 29 0 77 Zenotravel 100 27 0 7 7 7 Random 100 65 0 0 5 73 Elevator 93 76 100 0 0 93 Exploding 52 43 24 31 31 52 Drive 71 56 0 0 9 0 Schedule 51 54 0 0 1 0 Pitch. Catch 54 23 0 0 Tire 82 75 82 0 91 69 FPG Factored Policy Gradient Planner FOALP First Order Approximate Linear Programming sf. DP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams Paragraph A Graphplan Based Probabilistic Planner

Discussion • Novel all-outcome replanning technique outperforms naïve replanner • The replanner performed well even on the “real” probabilistic domains – Drive – The complexity of the domain might have contributed to this phenomenon • Replanner did not win the domains where it is supposed to be very best – Blocksworld FF-Replan : Sungwook Yoon

Weakness of the Replanning • Ignorance of the probabilistic effects – Try not to use actions with detrimental effects – Detrimental effects can sometimes easily be found • Ignorance of prior planning during replanning – Plan Stability Work by Fox, Gerevini, Long and Serina • No learning – There is an obvious learning opportunity, since it solves a problem repetitively FF-Replan : Sungwook Yoon

Potential improvements FF Replan • • Hashing state-action mapping state During (determinized) can be viewed as partial policy Select Max planning, when it meets • Currently, the mapping is action A 1 fixed A 2 always the previous seen state, • When there is a failure, we can Average stop updateplanning the policy, that is, give penalty to the state-actions in – May reduce the failure trajectory replanning • During planning, time try not to use those actions in those states FF Replan • Intelligent Replanning • Policy rollout • Policy learning • Hindsight Optimization State – E. g. , after explosion in the exploding-blocksworld, do not useoutcome putdown Action thataction really will happen Reward Goal State FF-Replan : Sungwook Yoon Reward