Performance Analysis of a Computational Biology Code on
Performance Analysis of a Computational Biology Code on CMP Clusters Timothy Campbell, Dr. Valerie Taylor, and Dr. Xingfu Wu Department of Computer Science, Texas A&M University College Station, TX 77843 -3112 Motivation Test Platforms Mr. Bayes Computational Biology Code Understanding parallel environments and focusing on performance of a computational biology code on two supercomputing systems. Node Type • Read the Nexus data file Description # Nodes Total Processors Memory per 16, 32 GB Node CPU per Node 8 16 CPU Speed 1. 5, 1. 7 GHz 1. 9 GHz Power POWER 4+ 5+ CPU Peak 6. 8 GFlops 7. 6 GFlops Chip Architecture Performance • Set the evolutionary model Research Goals • Phylogenetic analysis combining information from different data partitions • Collect performance results from Mr. Bayes computational biology code on two systems using PAIDE • Running the analysis • Summarize the samples • Evaluates trees and reconstructs DNA structure • Analyze performance results on both systems by processor partitioning and analyzing overhead • Estimates how much confidence one can have in different branches of the tree • Leads to more efficient drug development by understanding optimal combinations • Model results using online prophesy modeling system • Includes 6 data sets with ranging set problem sizes determined by the taxonomy. data file ADH ANOLIS AVIAN OVOCUMOIDS CYNMIX PRIMATES REPLICASE Data. Star P 655 TAMU Hydra IBM Power 4+ IBM Cluster Power 5+ 272 40 2176 640 taxonomy 54 30 • In the power 5+ chip architecture, the L 3 cache is on chip compared to the off chip architecture in the power 4+ system 89 32 12 9 • L 3 cache serves as a victim cache http: //www. research. ibm. com/journal/rd/494/sinha 1. gif Scalability Analysis Communication time Processor Partitioning 32 Processors 4 x 8 8 x 4 622. 05 658. 08 (3. 6%) (9. 6%) 611. 7 (. 17%) 652. 2 (6. 8%) Primates Replicase 16 x 2 693. 89 (15. 6%) 702. 88 (15. 1%) 32 x 1 600. 11 610. 63 Data. Star P 655 32 Processors Primates Replicase Execution Time on Data. Star P 655 4000 Runtime in Seconds 3000 4 x 8 8 x 4 458. 17 (4. 25%) 466. 94 (6. 4%) 464. 94 (5. 8%) 462. 71 (5. 4%) 491. 12 (11. 75%) 493. 68 (12. 5%) Hydra adh 3500 2 x 16 PAIDE anolis 2500 2000 avian ovomucoids 1500 cynmix primates 0 4 8 16 32 64 128 256 Number of Processors 512 1024 replicase 32 x 1 518. 84 (18. 06%) 439. 47 523. 67 (19. 36%) 438. 72 256 Processors Execution Time (sec) % Comm. primates 631. 07 93 replicase 585. 95 93. 4 512 Processors Execution Time % Comm. primates 941. 17 97. 4 replicase 906. 43 97. 5 1024 Processors Execution Time % Comm. primates 1359. 67 98. 9 replicase 1302. 89 99 Modeling • Parallel processing and multicore systems • Computational biology applications q. Better drug development q. More efficient drugs • Performance results q. Chip architecture q. Resource contention q. Communication affecting execution time • Future work q. Fine tune application to improve scalability Acknowledgements I would like to thank the CRA-W DMP Program from which this research is funded. Also to Dr. Valerie Taylor and special thanks to Dr. Xingfu Wu. 1000 500 16 x 2 Summary and Future Work http: //prophesy. cs. tamu. edu/
- Slides: 1