Evolving HyperHeuristics using Genetic Programming Supervisor Moshe Sipper

Overview Introduction • Searching Games State-Graphs • • • Uninformed Search Heuristics Informed Search

Representing Games as State-Graphs Every puzzle/game can be represented as a state graph: •

Searching Games State-Graphs Uninformed Search BFS – Exponential in the search depth DFS –

Searching Games State-Graphs Uninformed Search Most of the game domains are PSPACEComplete! Worst case

Searching Games State-Graphs Heuristics h: states -> Real. • For every state s, h(s)

Searching Games State-Graphs Informed Search Best-First search: Like DFS but select nodes with higher

Searching Games State-Graphs Informed Search (Cont. ) IDA*: Iterative-Deepening with A* • The expanded

Evolving Heuristics For H 1, … , Hn – building blocks (not necessarily admissible

Evolving Heuristics: GA W 1=0. 3 W 2=0. 01 W 3=0. 2 … Wn=0.

Evolving Heuristics: GP And + ≤ H 1 H 2 H 5 H 2

Evolving Heuristics: Policies Condition Result Condition 1 Heuristics Weights 1 Condition 2 Heuristics Weights

Evolving Heuristics: Fitness Function 18

Rush Hour GP-Rush [Hauptman et al, 2009] Bronze Humie award 20

Domain-Specific Heuristics Hand-Crafted Heuristics / Guides: • Blocker estimation – lower bound (admissible) •

Policy “Ingredients” Functions & Terminals: Terminals Sets Conditions Results Is. Move. To. Secluded, is.

Coevolving (Hard) 8 x 8 Boards G G F S S F H I

Results Average reduction of nodes required to solve test problems, with respect to the

Results (cont’d) Time (in seconds) required to solve problems JAM 01. . . JAM

Free. Cell remained relatively obscure until Windows 95 There are 32, 000 solvable problems

Free. Cell (cont’d) As opposed to Rush Hour, blind search failed miserably The best

Learning Methods: Random Deals Which deals should we use for training? First method tested

Learning Methods: Gradual Difficulty Second method tested - gradual difficulty • Sort the problems

Learning Methods: Hillis-Style Coevolution Third method tested - Hillis-style coevolution using “Hall-of-Fame”: • •

Learning Methods: Rosin-style Coevolution Fourth method tested - Rosin-style coevolution: • Each deal individual

Results Learning Method Gradual Difficulty Rosin-style coevolution Run Node Reduction Time Reduction Length Reduction

Thank you for listening any questions? 45

Slides: 30

Download presentation

Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf

Overview Introduction • Searching Games State-Graphs • • • Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work • Rush Hour • Free. Cell 2

Representing Games as State-Graphs Every puzzle/game can be represented as a state graph: • In puzzles, board games etc. , every piece move can be counted as a different state • In computer war games etc. – the place of the player / the enemy, all the parameters (health, shield…) define a state 3

Rush-Hour as a state-graph 4

Searching Games State-Graphs Uninformed Search BFS – Exponential in the search depth DFS – Linear in the length of the current search path. BUT: • We might “never” track down the right path. • Usually games contain cycles Iterative Deepening: Combination of BFS & DFS • Each iteration DFS with a depth limit is performed. • Limit grows from one iteration to another • Worst case - traverse the entire graph 5

Searching Games State-Graphs Uninformed Search Most of the game domains are PSPACEComplete! Worst case - traverse the entire graph We need an informed-search! 6

Searching Games State-Graphs Heuristics h: states -> Real. • For every state s, h(s) is an estimation of the • • • minimal distance/cost from s to a solution h is perfect: an informed search that tries states with highest h-score first – will simply stroll to solution For hard problems, finding h is hard Bad heuristic means the search might never track down the solution We need a good heuristic function to guide informed search 7

Searching Games State-Graphs Informed Search Best-First search: Like DFS but select nodes with higher heuristic value first • Not necessarily optimal • Might enter cycles (local extremum) A*: • Holds closed and sorted (by h-value) open lists. • Best node of all open nodes is selected Maintenance and size of open and closed is not admissible 8

Searching Games State-Graphs Informed Search (Cont. ) IDA*: Iterative-Deepening with A* • The expanded nodes are pushed to the DFS stack • by descending heuristic values Let g(si) be the min depth of state si: Only nodes with f(s)=g(s)+h(s)<depth-limit are visited Near optimal solution (depends on path-limit) The heuristic need to be admissible 9

Overview Introduction • Searching Games State-Graphs • • • Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work • Rush Hour • Free. Cell 13

Evolving Heuristics For H 1, … , Hn – building blocks (not necessarily admissible or in the same range), How should we choose the fittest heuristic? • Minimum? Maximum? Linear combination? GA/GP may be used for: • Building new heuristics from existing building blocks • Finding weights for each heuristic (for applying linear combination) • Finding conditions for applying each heuristic • H should probably fit stage of search • E. g. , “goal” heuristics when assuming we’re close 14

Evolving Heuristics: GA W 1=0. 3 W 2=0. 01 W 3=0. 2 … Wn=0. 1 15

Evolving Heuristics: GP And + ≤ H 1 H 2 H 5 H 2 ≥ 0. 4 * e n o C False Tru n o i dit If 0. 7 * H 1 / H 1 0. 1 16

Evolving Heuristics: Policies Condition Result Condition 1 Heuristics Weights 1 Condition 2 Heuristics Weights 2 Condition n Heuristics Weights n Default Heuristics Weights 17

Evolving Heuristics: Fitness Function 18

Overview Introduction • Searching Games State-Graphs • • • Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work • Rush Hour • Free. Cell 19

Rush Hour GP-Rush [Hauptman et al, 2009] Bronze Humie award 20

Domain-Specific Heuristics Hand-Crafted Heuristics / Guides: • Blocker estimation – lower bound (admissible) • Goal distance – Manhattan distance • Hybrid blockers distance – combine above two • Is Move To Secluded – did the car enter a secluded area? • Is Releasing Move 21

Policy “Ingredients” Functions & Terminals: Terminals Sets Conditions Results Is. Move. To. Secluded, is. Releasing. Move, g, Phase. By. Distance, Phase. By. Blockers, Number. Of. Syblings, Difficulty. Level, Blockers. Lower. Bound, Goal. Distance, Hybrid, 0, 0. 1, … , 0. 9 , 1 If, AND , OR , ≤ , ≥ +, * 26

Coevolving (Hard) 8 x 8 Boards G G F S S F H I I H P K K K M RED 27

Results Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search: Heuristic: Problem ID H 1 H 2 H 3 Hc Policy 6 x 6 100% 28% 6% -2% 30% 60% 8 x 8 100% 31% 25% 30% 50% 90% 28

Results (cont’d) Time (in seconds) required to solve problems JAM 01. . . JAM 40: 29

Free. Cell remained relatively obscure until Windows 95 There are 32, 000 solvable problems (known as Microsoft 32 K), except for game #11982, which has been proven to be unsolvable Evolving hyper heuristic-based solvers for Rush-Hour and Free. Cell [Hauptman et al, SOCS 2010] GA-Free. Cell: Evolving Solvers for the Game of Free. Cell [Elyasaf et al, GECCO 2011] 30

Free. Cell (cont’d) As opposed to Rush Hour, blind search failed miserably The best published solver to date solves 96% of Microsoft 32 K Reasons: • High branching factor • Hard to generate a good heuristic 31

Learning Methods: Random Deals Which deals should we use for training? First method tested - random deals • This is what we did in Rush Hour • Here it yielded poor results • Very hard domain 32

Learning Methods: Gradual Difficulty Second method tested - gradual difficulty • Sort the problems by difficulty • Each generation test solvers against 5 deals from the current difficulty level + 1 random deal 33

Learning Methods: Hillis-Style Coevolution Third method tested - Hillis-style coevolution using “Hall-of-Fame”: • • A deal population is composed of 40 deals (=40 individuals) + 10 deals that represent a hall-offame Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals Evolved hyper-heuristics failed to solve almost all Microsoft 32 K! Why? 34

Learning Methods: Rosin-style Coevolution Fourth method tested - Rosin-style coevolution: • Each deal individual consists of 6 deals • Mutation and crossover: p 1 11897 p 2 28371 18923 p 1 11897 3042 23845 7364 9834 12 2015 23845 7364 17987 5984 30011 13498 17987 5984 35

Results Learning Method Gradual Difficulty Rosin-style coevolution Run Node Reduction Time Reduction Length Reduction Solved HSD 100% 96% GA-1 23% 31% 1% 71% GA-2 27% 30% -3% 70% GP - - Policy 28% 36% 6% 36% GA 87% 93% 41% 98% Policy 89% 90% 40% 99% 36

Thank you for listening any questions? 45