Using Personal Condor to Solve Quadratic Assignment Problems
Using Personal Condor to Solve Quadratic Assignment Problems Jeff Linderoth Axioma, Inc. jlinderoth@axiomainc. com
Partners in Crime Kurt Anstreicher Jean-Pierre Goux Nate Brixius University of Iowa MCS Division, ANL LOTS of people in this room! University of Wisconsin
Our Mission 1. Find the best possible solution to large quadratic assignment problem (QAP) instances 2. Prove that the solution is indeed optimal 3. Show to exploit the Computational Grid offered by Personal Condor to make it happen
What’s a QAP? n n Can be thought of as a facility location problem The QAP is NP-REALLY-Hard n n TSP: Solve n=13509 QAP: Solve n=25
Q: Why Is This Important? n Answer #1: Practical applications n n Facility Location Hospital Design Flight Instrument Layout Answer #2: Similarity n n Comparable to other practically important combinatorial optimization problems TSP, MIP
The REAL Answer – It’s NOT! “The Journey Is The Reward” What can we learn about solving complex numerical problems on Computational Grids?
The Perfect Marriage + While my wife likes this slide, really it’s the QAP and Condor that make the perfect marriage!
Making the Perfect Marriage n n Something Old New Borrowed Blue
Something Old: Branch-and-Bound 1. Bound n n Solve “auxiliary” problem that gives a lower bound on the optimal solution to the problem Any assignment of facilities to locations gives an upper bound on the optimal solution What if lower bound < upper bound?
Branch n Divide-and-Conquer! n n This is not “pleasantly parallel” computing! Recursively make problem smaller by assigning each facility to a fixed location Without the bounding, this is complete enumeration. (n!)
Something New: n n A convex quadratic programming relaxation Solved with the Frank-Wolfe Algorithm*. n Each iteration is one linear assignment problem * Something VERY old
Something Borrowed: n With Condor it is easy to “borrow” CPU cycles 1. Call your friends and colleagues and flock with their Condor pools 2. Write an NPACI proposal and Glide-In to supercomputer resources 3. If all else fails (Condor/Globus not installed), hobble in!
My Personal Grid Number Type Location Method 414 96 Intel/Linux SGI/Irix Argonne Hobble-In Glide-In 1024 SGI/Irix NCSA Glide-In 16 Intel/Linux NCSA Flocked 45 SGI/Irix NCSA Flocked 246 Intel/Linux Wisconsin Flocked 146 Intel/Solaris Wisconsin Flocked 133 Sun/Solaris Wisconsin Flocked 190 Intel/Linux Georgia Tech Flocked 94 Intel/Solaris Georgia Tech Flocked 54 Intel/Linux Italy (INFN) Flocked 25 Intel/Linux New Mexico (AHPCC) Flocked 5 Intel/Linux Columbia U. Flocked 10 Sun/Solaris Columbia U. Flocked 12 Sun/Solaris Northwestern Flocked
Something Blue? n You could work until you’re blue in the face and not solve QAP instances* Instance Arch. Wall Time Person Date Nug 22 Ultra 360 MHz 56 Hours Hahn 1999 Nug 24 Ultra 360 MHz 9 days Hahn 1999 Nug 25 Ultra 360 MHz 66 days Hahn 1999 Nug 22 48 -96 Cenju-3 9 days Marzetta 1998 Nug 25 64 -128 Paragon 30 days Marzetta 1998 * My sincerest apologies for the terrible pun
The Holy Grail n n We want to solve nug 30! Extrapolating results and using an idea of Knuth*, we conjecture that we will need roughly 10 -15 years of CPU time How can we be sure to use 10 -15 years of CPU time somewhat efficiently? We have the additional burden of working in Condor’s extremely dynamic environment! * Something Old
Making the Marriage Work n The MW runtime support library helps us cope with the dynamic nature of our platform n MW – Master Worker paradigm n Must deal with contention at the master n Search/ordering strategies at both master and worker are important! n Parallel Efficiency improves from 50% to 90% n Lots more details! Paper available at www. optimization-online. org n
Mission Accomplished! Solution Characteristics Wall Clock Time Avg. # Machines Max. # Machines CPU Time 6: 22: 04: 31 653 1007 Approx. 11 years Nodes 11, 892, 208, 412 LAPs Parallel Efficiency 574, 254, 156, 532 92%
Number of Workers
The Ups & Downs Human (read Jeff) error 1. • Master compiled for <= 1000 workers Condor schedd bug (Gasp!!!!) Master shut down to fix NFS problems Condor schedd bug Human (read Jeff) error 2. 3. 4. 5. • Incorrect editing of configuration files resulting in many incorrect submissions
Number of Workers on June 12
Number of Workers at Three Biggest Contributors
Number of Workers at Three Next Largest Contributors
KLAPS
The Moral of the Story n n A good wedding/marriage requires four key ingredients There were also four key ingredients to solving nug 30 1. Powerful mathematics for producing a lower bound 2. Innovative branching techniques 3. An EXTREMELY powerful computing platform 4. “Marrying” the algorithm to the platform in an appropriate manner
The TRUE Moral n n n It is possible to do complex numerical calculations on the Computational Grid using Condor! It opens the doors to attacking heretofore unsolved problems! http: //www. mcs. anl. gov/metaneos
- Slides: 25