Cray XD 1 Reconfigurable Computing for Application Acceleration
Cray XD 1 Reconfigurable Computing for Application Acceleration
Cray XD 1 System Architecture • 50 GFLOPS - 2+ TFLOPS • 12 – 1024+ processors • Entry/Mid range system optimized for sustained performance • With reconfigurable computing capability Compute • 12 AMD Opteron 32/64 bit, x 86 processors • High Performance Linux Rapid. Array Interconnect • 12 communications processors • 1 Tb/s switch fabric Active Management • Dedicated processor Application Acceleration • 6 co-processors Processors directly connected via integrated switch fabric 2
Naval Research Lab • 24 chassis / 2. 5 Tera. Flops System • 288 AMD Dual-core Processors • 144 Xilinx Virtex-II Pro CINECA Italy’s national supercomputer center 24 Affiliated universities & the National Research Council 634 GFLOP Cray XD 1 system (144 Processor) with Application Acceleration FPGAs Air Force Maui Optical and Supercomputing Site • 1. 3 TFLOP Cray XD 1 system (288 Processor) • Part of $23 M contract awarded to Cray from Do. D HPC Modernization Program. • Located at Maui Space Surveillance Site 3
The Barriers to Reconfigurable Computing 0. Choosing the Right Application! • Core functionality, increasing complexity, increasing demand on resources 1. Starving the FPGA • Bandwidth and latency to the FPGA limited by PCI bus 2. FPGA, Processor Interaction • Job scheduling, Linux integration, memory mapping 3. Programming Tools • Programming hardware requires special tools, special expertise. 4
The Right Application…. Industries q Life Sciences q Electronic Design Automation q Manufacturing q Energy & Natural Resources q Media q Government Example Functions q Searching & Sorting q Signal & Image Processing q Encryption/Decryption q Error Correction q Coding/Decoding q Packet Processing q Random-number Generation q Bit Manipulation q And many more Example Applications q Seismic Processing q Astrophysics / Adaptive Optics q Graphics Acceleration q Quantum Physics q Bioinformatics q Cheminformatics q Vehicular Traffic Simulations q Financial Modeling q And many more Why FPGAs in a Cray product? 1. Performance! 2. Performance! 3. Performance! 4. Lower Power Density 5. Deployments in the Field, Mobile Platforms, … 5
Celoxica C->FPGA Compiler • Handel-C compiler allows FPGA development in C (with extensions) • It does not eliminate the need to have hardware awareness! • Enables single programming model for Logic and Power. PCs • Example: Mersennes Twister RNG • 3 days to port onto FPGA index = 0; while (index < length){ if(table[index] = key) found=index; else index = index+1; } } (performance << Opteron!!) • 2 -3 weeks to optimize • End result roughly comparable to VHDL development • Beta Test underway with OSC 6
Mitrion The application: Thin Plate Splines -- image analysis of protein gels • Image morphing based on natural logarithm computations • Essential for comparing protein content • Speedup per FPGA: 10 -30 x. Reduces analysis runtime from days to hours. • 180 lines of Mitrion-c code generates 150, 000 lines of VHDL code • Speedup per FPGA: 10 -30 x • Pure software programming – easy to learn for an HPC programmer • No hardware design considerations 7
Higher Level Abstractions: Mobius (www. codetronix. com) • • Pascal-like CSP based language (Types , records, arrays, fp arithmetic) Synchronization and communication by handshaking over channels Generate HW, SW or HW/SW code General purpose & dataflow algorithms • Pipelined DES: • ~ 120 lines of Mobius • ~ 2 GB/s thruput • ~ 2200 slices DES 8
Easing FPGA Adoption … 1. Traditional Programming Model • VHDL, Verilog 2. Off-The-Shelf Libraries • • Cray and third party acceleration libraries Prepackaged, turnkey applications 3. High-Level Compilers • C, Graphical, Matlab Working to Create an Open Source FPGA Community 9
- Slides: 9