Compressing Pattern Databases Ariel Felner BarIlan University felnercs

Compressing Pattern Databases Ariel Felner Bar-Ilan University. felner@cs. biu. ac. il March 2004 Joint work with Ram Meshulam, Robert Holte and Richard E. Korf Submitted to AAAI 04. Available at: http: //www. cs. biu. ac. il/~felner 1

A* and its variants • A* (and IDA*) is a best-first search algorithm that uses f(n)=g(n)+h(n) as its cost function. Nodes are sorted in an openlist according to their f-value. • g(n) is the shortest known path between the initial node and the current node n. • h(n) is an admissible (lower bound) heuristic estimation from n to the goal node • Recently, the attention has shifted towards creating more accurate heuristic functions. 2

Pattern databases • Many problems can be decomposed into subproblems (patterns) that must be also solved. • The pattern space is a domain abstraction of the original space • The cost of a solution to a subproblem is a lower-bound on the cost of the complete solution • Instead of calculating the lower bounds on the fly, we expand the whole pattern-space and store the solution to each pattern configuration in a pattern database Search space Mapping function Pattern space 3

Non-additive pattern databases • Fringe database for the 15 puzzle by [Culberson and Schaeffer 1996]. • Stores the number of moves including tiles not in the pattern • Rubik’s Cube. [Korf 1997] • The best way to combine different non-additive pattern databases is to take their maximum! 4

Additive pattern databases • We can add values from different pattern databases if they are disjoint (and count their own moves) • There are two ways to build additive databases: • Statically-partitioned additive databases (they were also called disjoint pattern databases) • Dynamically-partitioned additive databases. • Applications of additive pattern databases – Tile puzzles – 4 -peg Towers of Hanoi Puzzle (TOH 4) 5

Statically-partitioned additive databases • These were created for the 15 and 24 puzzles [Korf & Felner 2002] • We statically partition the tiles into disjoint patterns and compute the cost of moving only these tiles into their goal states. • For the 15 puzzle: • • 36, 710 nodes. • • 0. 027 seconds. • • 575 MB • 7 8 6 6 For the 24 puzzle: 360, 892, 479, 671 2 days 242 MB 6

4 -peg Towers of Hanoi (TOH 4) • There is a conjecture about the length of optimal path but it was not proven. • Systematic search is the only way to solve this problem or to verify the conjecture. • There are too many cycles. IDA* as a DFS will not prune these cycle. Therefore, A* (actually frontier A* [Korf & Zhang 2000]) was used. 7

Additive PDBS for TOH 4 • Partition the disks into disjoint sets (patterns). For example, 10 and 6 for the 16 -disk problem. • Store the cost of the complete pattern space of each set in a pattern database. (There are many enhancements) • The n-disk problem contains 4^n states and 2 n bits suffice to store each state. • The largest databases that we stored was of size 14 which needed 4^14=256 MB. 8

TOH 4: results 16 disks Heuristic solution h(s) Avg h Nodes seconds Static 13 -3 161 102 75. 78 134, 653, 232 48 Static 14 -2 161 114 89. 10 36, 479, 151 14 Dynamic 14 -2 161 114 95. 52 12, 872, 732 21 238, 561, 590 2, 501 17 disks Dynamic 14 -3 183 116 97. 05 9

How to best use the memory • The speed of the search is directly related to the size of the pattern database. • We usually omit the computation time of the PDBs but cannot ignore the memory requirements • [Holte, Newton, Felner, Mushulam and Furcy 2004] showed that it is better to use many small databases and take their maximum instead of one large database. • We limit the discussion to 1 Giga bytes. 10

Compressing pattern databases • Traditionally, each configuration of the pattern had a unique entry in the PDB. • Our main cliam Nearby entries in PDBs are highly correlated !! • We propose to compress nearby entries by storing their minimum in one entry. • We show that most of the knowledge is preserved • Consequences: Memory is saved, larger patterns cab be used speedup in search is obtained. 11

Cliques in the pattern space • The values in a PDB for a clique are d or d+1 • In permutation puzzles cliques exist when only one object moves to another location. • Usually they have nearby entries in the PDB G d d d+1 12

Storing cliques • Assume a clique of size K with values d or d+1 • Lossy compression Store only one entry for the clique with the minimum d. Loose at most 1. • Lossless compression Store the minimum d. Also store K additional bits, one per entry. A clique in TOH 4 13

Compressing PDBs in TOH 4 • If we compress the last index of smallest disk then a PDB with P disks can now be stored in only 4^(P-1) entries instead of 4^P • This can be generalized to a set of nodes with diameter D. (for cliques D=1) • For TOH 4, we fix the position of the largest P-2 disks and compress all the 4^2=16 entries of the smallest 2 disks. • In general, compressing any block will work, not necessarily cliques. 14

TOH 4 results: 16 disks (14+2) PDB H(s) Avg H D Nodes Time Mem MB 14/0 + 2 116 87. 03 0 36, 479, 151 14. 34 256 14/1 + 2 115 86. 48 1 37, 964, 227 14. 69 64 14/2 + 2 113 85. 67 3 40, 055, 436 15. 41 16 14/3 + 2 111 84. 44 5 44, 996, 743 16. 94 4 14/4 + 2 107 82. 73 9 45, 808, 328 17. 36 1 14/5 + 2 103 80. 84 13 61, 132, 726 23. 78 0. 256 14/1 s +2 116 87. 03 0 36, 479, 151 15. 87 96 • Memory was reduced by a factor of 1000!!! at a cost of only a factor of 2 in the search effort. • Lossless compressing is not efficient in this domain. 15

TOH 4: larger versions size PDB Type Avg H Nodes Time Mem 17 14/0 + 3 static 81. 5 >393, 887, 923 >421 256 17 14/0 + 3 dynamic 87. 0 238, 561, 590 2, 501 256 17 15/1 + 2 static 103. 7 155, 737, 832 83 256 17 16/2 + 1 static 123. 8 17, 293, 603 7 256 18 16/2 + 2 static 123. 8 380, 117, 836 463 256 • For the 17 disks problem a speed up of 3 orders of magnitude is obtained!!! • The 18 disks problem can be solved in 5 minutes!! 16

Tile Puzzles 0 0 3 • We can take advantage of the simple heuristics. We can store 0 0 6 7 only the addition above the 0 0 10 11 Manhattan distance heuristic 0 0 2 2 Storing PDBs for the tile puzzle • (Simple mapping) A multi dimensional array A[16][16][16] size=1. 04 Mb • (Packed mapping) One dimensional array with A[16*15*14*13*12 ] size = 0. 52 Mb. 17 • The time and memory tradeoff is straightforward!!

15 puzzle results • A clique in the tile puzzle is of size 2. • We compressed the last index by two A[16][16][8] PDB Type compress Nodes Time Mem Avg H 7 -8 packed ---- 136, 288 0. 081 576, 575 44. 75 1+ 7 -8 packed ---- 36, 710 0. 034 576, 575 45. 63 1 7 -7 -1 packed ---- 464, 977 0. 232 57, 657 43. 64 1 7 -7 -1 simple ---- 464, 977 0. 058 536, 870 43. 64 1 7 -7 -1 simple lossy 565, 881 0. 069 268, 435 43. 02 1 7 -7 -1 simple lossless 487, 430 0. 070 268, 435 43. 59 2 7 -7 -1 simple lossy 147, 336 0. 021 536, 870 43. 98 2+ 7 -7 -1 simple lossy 66, 692 0. 016 536, 870 44. 92 1 18

24 puzzle • The same tendencies were obtained for the 24 puzzle. • The 6 -6 -6 -6 partitioning is so good that adding another set of 6 -6 -6 -6 did not speedup the search. • We have also tried a 7 -7 -5 -5 partitioning but it did not speedup the search. 19

Ongoing and future work • An item for the PDB of tiles (a, b, c, d) is in the form: <La, Lb, Lc, Ld>=d • Store the PDBs in a Trie • A PDB of 5 tiles will have a level in the trie for each tile. The values will be in the leaves of the trie. • This data-structure will enable flexibility and will save memory as subtrees of the trie can be pruned 20

Trie pruninig Simple (lossless) pruning: Fold leaves with exactly the same values. No data will be lost. 2 2 21

Trie pruninig Intelligent (lossy)pruning: Fold leaves/subtrees with are correlated to each other (many option for this!!) Some data will be lost. Admissibility is still kept. 2 2 2 4 2 22

Trie: Initial Results A 5 -5 -5 partitioning stored in a trie with simple folding PDB MD H(s) Nodes Time Nodes/sec Mem Simple 36. 94 41. 56 3, 090, 405 0. 6 5, 150, 676 3, 145, 728 Packed 36. 94 41. 56 3, 090, 405 3. 126 988, 613 1, 572, 480 Trie 36. 94 41. 56 3, 090, 405 2. 593 1, 191, 826 765, 778 23

Neural Networks (NN) • We can feed a PDB into a neural network engine. Especially, Addition above MD • For each tile we focus on its dx and dy from its goal position. (i. e. MD) • Linear conflict : 2 1 • dx 1= dx 2 = 0 • dy 1 > dy 2+1 dy 1 =2 • A NN can learn dy 2=0 these rules 24

Neural network • We train the NN by feeding the entire (or part of the) pattern space. • For example for a pattern of 5 tiles we have 10 features, 2 for each tile. • During the search, given the locations of the tiles we look them up in the NN. 25

Neural network example dx 4 dy 4 dx 5 Layout for the pattern of the tiles 4, 5 and 6 4 dy 5 dx 6 dy 6 26

Neural Network: problems • We face the problem of overestimating and will have to bias the results towards underestimating. • We keep the overestimating values in a separate hash table • Results are encouraging!! PDB H(s) Nodes Time Mem Regular 31. 00 243, 290 0. 49 1, 572, 480 Neural Network 29. 67 454, 262 69. 75 33, 611 d+472 w 27

Selective Pattern Database • Only part of the pattern space is queried for a single problem instance. • If we can identify that part we can only generate that part. 28