Applying a Genetic Algorithm to Reconfigurable Hardware a
Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study B. Earl Wells*, Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama *University Wells of Alabama in Huntsville, Alabama 1 E 169/MAPLD 2004
Project Motivation & Objectives • To evaluate the technology of reconfigurable computing -- determine its level of maturity and suitability for use in future NASA applications • To implement a nontrivial test bed type application on a Star Bridge Hypercomputer Model 36 • Chosen Application: a simple Genetic Algorithm Wells 2 E 169/MAPLD 2004
Targeted Hardware Platform • Starbridge HC-36 Hypercomputer System • Employs Xilinx Virtex II 6000 Series FPGAs Wells 3 E 169/MAPLD 2004
Development Environment • Development Environment: VIVA ™ Graphical User Interface Structural Design Philosophy with Behavioral Attributes: Polymorphism Object Overload Recursion Data flow and data driven type synchronization between objects (Go, Done, Busy, Wait protocol) Large library of high end objects Environment falls somewhere between hardware description languages and schematic capture packages Wells 4 E 169/MAPLD 2004
Polymorphism, Overloading, Recursion, and Synchronization Example: Object to Determine Number of 1’s in a Binary Number Terminal Case Recursive Case Wells 5 E 169/MAPLD 2004
Genetic Algorithms • Biologically Inspired Search Techniques • Employs Selection, Replication (crossover), Mutation, and Replacement • Iterative method -- very time intensive • Regularly Structured • Large Amounts of Concurrency Present that can be Exploited Wells 6 E 169/MAPLD 2004
Genetic Algorithm Implementation Top Level View Run Time Environment Wells 7 E 169/MAPLD 2004
GA Characteristics • 2 Way Tournament Selection • No Elitism • Single Point Cross Over with bit-wise mutation • Weight Encoded Chromosome (weight translated into rank ordering of cities) • Adjustable Parameters Population Size 2 to 512 (powers of 2), Number of Generations, Probability of Mutation, Probability of Crossover Wells 8 E 169/MAPLD 2004
Block Diagram Level View of Genetic Algorithm Implementation Wells 9 E 169/MAPLD 2004
Replacement & Chromosome Storage Wells 10 E 169/MAPLD 2004
Selection Wells 11 E 169/MAPLD 2004
Standard Single Point Crossover Operation (Weighted Chromosomes) Crossover Point = 4 Chromosome 1 {25, 17, 10, 20, 33, 14, 7, 29} Chromosome 2 {44, 12, 17, 38, 20, 5, 70, 13} Offspring Chromosome {25, 17, 10, 20, 5, 70, 13} Wells 12 E 169/MAPLD 2004
Standard Single Point Crossover Operation (Weighted Chromosomes) Wells 13 E 169/MAPLD 2004
Single Point Mutation (Weighted Chromosomes) Original Chromosome {25, 17, 10, 20, 5, 70, 13} Mutated Chromosome Mutated Element = 5 {25, 17, 10, 20, 55, 5, 70, 13} Wells 14 E 169/MAPLD 2004
Traveling Salesman Problem (TSP) • Given a specified number of “cities” along with the cost of travel between each pair of them, find the cheapest way of visiting all the cities and returning to the first city visited • Asymmetric Case – direction traveled between any two cities matters (i. e. cost is different) • Possible solutions (n-1)! – where n is the number of cities Wells 15 E 169/MAPLD 2004
Traveling Salesman Problem (TSP) • Well understood NP Complete optimization problem • Academic literature contains many test problems • Chose for test purposes an Asymmetric TSP with 65 cities (TSP 65)* • Used a modified weight encoded chromosome representation *University Wells of Heidelberg, http: //www. iwr. uni-heidelberg. de/groups/comopt/software/TSPLIB 95 16 E 169/MAPLD 2004
Equivalent TSP Chromosome Representations Weighted Chromosome City No. weights 0 1 2 3 4 5 6 7 {25, 17, 10, 20, 55, 5, 70, 13} Rank Ordering [ 5, 3, 1, 4, 6, 0, 7, 2 ] Visit Order Permutation Chromosome City Visit Order 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th { 5, 2, 7, 1, 3, 0, 4, 6} city numbers Wells 17 E 169/MAPLD 2004
TSP Objective Function • Systolic sort of chromosome weights • Summation of segments • Replacement of weights with rank orderings Wells 18 E 169/MAPLD 2004
Single Point Permutation Preserving Crossover Operation Crossover Point = 4 Chromosome 1 {1, 7, 3, 2, 5, 6, 0, 4} Chromosome 2 {0, 2, 4, 1, 6, 5, 7, 3} Offspring Chromosome {1, 7, 3, 2, 0, 4, 6, 5} Wells 19 E 169/MAPLD 2004
Modified Crossover Operator Wells 20 E 169/MAPLD 2004
Permutation Altering Mutation Original Chromosome {1, 7, 3, 2, 0, 4, 6, 5} Mutation Removal Point = 6 Insertion Point = 3 Mutated Chromosome {1, 7, 4, 3, 2, 0, 6, 5} Note: No change in Mutation Operator Needed Wells 21 E 169/MAPLD 2004
Wells 22 E 169/MAPLD 2004
Comparison with Instruction Set Processor, ISP, Implementations • Implemented TSP using a high-end 3. 2 GHz Intel Xeon Processor with 3 -level Cache • Encoded Problem in C using pointers for maximum efficiency • OS: Redhat Enterprise Linnx v 3 (Kernal 2. 4. 21 SMP) -- single user • Basic Methodology Required ~1. 6 m. S/per Generation (population size 512) • Optimized Version Required ~ 0. 8 ms/per Generation (population size 512) Wells 23 E 169/MAPLD 2004
Parallelization Strategies • Initial Basic Reconfigurable Implementation on the Starbridge System required ~1. 1 m. S/per Generation! [slower than the optimized ISP implementation] (population size = 512, Clock speed 66 MHz) • MORE PARALLELIZATION WAS NEEDED! Wells 24 E 169/MAPLD 2004
Parallelization Strategies • Exploiting Concurrency in a Common Population – Temporal Parallelism via pipelining – Spatial Parallelism via replicating functional units • Processing Isolated Subpopulations – With chromosome migration (very promising for Starbridge system but not yet completed) Wells 25 E 169/MAPLD 2004
Applying Temporal Parallelism Wells 26 E 169/MAPLD 2004
Applying Spatial Parallelism Wells 27 E 169/MAPLD 2004
Wells 28 E 169/MAPLD 2004
Resource Requirements • Non-pipelined 1 TSP Implementation Number of SLICES 10910 out of 33792 32% Number of Block RAMs 40 out of 144 Total equivalent gate count: 2, 767, 231 27% • Pipelined 1 TSP Implementation Number of SLICES 10957 out of 33792 32% Number of Block RAMs 40 out of 144 27% Total equivalent gate count: 2, 770, 741 Wells 29 E 169/MAPLD 2004
Resource Requirements • Pipelined 2 TSP Implementation Number of SLICES 13738 out of 33792 Number of Block RAMs 45 out of 144 Total equivalent gate count: 3, 149, 966 40% 31% • Pipelined 4 TSP Implementation Number of SLICES 19685 out of 33792 58% Number of Block RAMs 55 out of 144 38% Total equivalent gate count: 3, 908, 362 • Pipelined 6 TSP Implementation Number of SLICES 25728 out of 33792 76% Number of Block RAMs 65 out of 144 45% Total equivalent gate count: 4, 664, 262 Wells 30 E 169/MAPLD 2004
Problems Encountered • Synthesis Time Issues (within Viva and within Xilinx) • Maturity/Robustness of CAD Tools • Learning Curve • Timing Issues • I/O Pin Limitations Wells 31 E 169/MAPLD 2004
Summary & Conclusion • A simple genetic algorithm was implemented on reconfigurable hardware using the Viva paradigm • Significant but not spectacular speedups have been obtained for the TSP using a combination of temporal and spatial parallel processing methods • Many other opportunities exist to improve processing through put • The concept of isolated subpopulations is very promising method to further improve performance Wells 32 E 169/MAPLD 2004
- Slides: 32