Accelerating a climate physics model with Open CL

  • Slides: 17
Download presentation
Accelerating a climate physics model with Open. CL CMSC 601 Spring 11 – Research

Accelerating a climate physics model with Open. CL CMSC 601 Spring 11 – Research Skills Dibyajyoti Ghosh

What is climate physics model? • Global weather is controlled by many interconnected events.

What is climate physics model? • Global weather is controlled by many interconnected events. Includes changes in atmosphere and oceans, ebb and flow of sea ice etc. • World’s most powerful super computers can simulate these events. • CCSM-2 model simulate Earth’s climate patterns in considerable detail through 700 billion calculations to recreate a single day of the world’s climate. • Scientists use these data to understand ocean currents, predict weather patterns, study O 3 layer among others. http: //www. ucar. edu/communications/CCSM/overview. html

Background • Solar radiation component of NASA’s GEOS-v 5 takes ~20% of model computation

Background • Solar radiation component of NASA’s GEOS-v 5 takes ~20% of model computation time. • NASA interested in analysis of performance and cost benefit using non traditional computing systems. • GEOS-v 5 - 20+ old, written in Fortran (mostly), still evolving. • Cannot be entirely rewritten due to production constraints. http: //www. ucar. edu/communications/CCSM/overview. html

Related Work • Accelerating Climate Models with the IBM Cell Processor – Shujia Zhou

Related Work • Accelerating Climate Models with the IBM Cell Processor – Shujia Zhou et al, 2008 • GPU Computing for Atmospheric Modeling - Kelly, Rory NCAR, Boulder, July-Aug. 2010 • Accelerating Atmospheric Modeling Through Emerging Multi -core Technologies - Linford, John Christian , Virginia Tech, 2010 • Exploiting Array Syntax in Fortran for Accelerator Programming - Matthew J. Sottile, Craig E Rasmussen, Wayne N. Weseloh, Robert W. Robey, Los Alamos National Laboratory

Motivation Open. CL - created with goal of unifying hybrid systems. No literature on

Motivation Open. CL - created with goal of unifying hybrid systems. No literature on Open. CL portability among architectures. No data on how Open. CL fares against GCC in vectorization. http: //www. cc. gatech. edu/~bader/AFRL-GT-Workshop 2009/AFRL-GT-Bader. pdf

What is vectorization? VF = 4 VR 1 0 1 2 3 a b

What is vectorization? VF = 4 VR 1 0 1 2 3 a b c d OP(a) VR 2 OP(b) VR 3 OP(c) VR 4 OP(d) VR 5 ³ original serial loop: for(i=0; i<N; i++){ a[i] = a[i] + b[i]; } VOP( a, b, c, VR 1 d ) ³ loop in vector notation: Vector operation for (i=0; i<N; i+=VF) { a[i: i+VF-1] = a[i: i+VF-1] + b[i: i+VF-1]; } vectorization Vector Registers Data in Memory: ³ Data elements packed into vectors ³ Vector length Vectorization Factor (VF) a b c d e f g h i j k l m n o p 6 Thanks to Dorit Nuzman, IBM www. hipeac. net/system/files/4_Nuzman. ppt for this wonderful slide

Open. CL trivia • A framework for heterogeneous computing resources developed by Apple Inc.

Open. CL trivia • A framework for heterogeneous computing resources developed by Apple Inc. now supported by all major vendors. • A subset of C language with additional features to facilitate parallel processing. http: //www. khronos. org/opencl/

How data ||-ism works on Open. CL? • Kernel is the code for a

How data ||-ism works on Open. CL? • Kernel is the code for a work item that is executed on a device (CPU or GPU or others). • Imagine a Nx. N grid with one kernel invocation per grid.

Our Approach Used code from the production version of the NASA GEOS-v 5 climate

Our Approach Used code from the production version of the NASA GEOS-v 5 climate model. • Step #1 – Identify computation intensive sections from the weather model. • Step #2 – Port these sections to Open. CL on IBM Cell B. E. and then to Mac OSX to test on Intel CPU. • Step #3 – Analyze performance and reason the performance.

Findings - I Speedup on IBM Cell B. E. with Open. CL Speedup on

Findings - I Speedup on IBM Cell B. E. with Open. CL Speedup on Mac OSX with Open. CL Serial VS parallel speedup of a code section analyzed on Mac OSX

Findings - II 1. Speedup achieved ~40 x on both IBM and Intel CPUs.

Findings - II 1. Speedup achieved ~40 x on both IBM and Intel CPUs. 2. Code NOT portable among architectures, sections of code not functioning due to incomplete Open. CL implementation on Mac OSX Intel based architecture. 3. GCC vectorization fails in certain cases compared to Open. CL. We attempted compilation of serial code with gcc -O 2 -ftree-vectorize flag.

Road Ahead • Making appropriate changes to the solar radiation code for Mac OS

Road Ahead • Making appropriate changes to the solar radiation code for Mac OS X Intel CPU based architecture. Remember some parts of the code base is non-functional on Intel CPUs. • Modify the Open. CL code to run on GPUs and understand if performance is portable, in addition to code.

Summary • Open. CL’s attempt towards portability in high performance computing is still a

Summary • Open. CL’s attempt towards portability in high performance computing is still a long road ahead. • GCC vectorization fails against Open. CL.

Acknowledgements • Dr. Shujia Zhou, MC 2 Lab • Fahad Zafar, MC 2 Lab

Acknowledgements • Dr. Shujia Zhou, MC 2 Lab • Fahad Zafar, MC 2 Lab • Center for Hybrid Multicore Productivity Research, UMBC • CMSC 601 folks

http: //www. asianjobportal. com/wp-content/uploads/2010/11/25_questions_interview. jpg

http: //www. asianjobportal. com/wp-content/uploads/2010/11/25_questions_interview. jpg

Vectorization Analysis - I A part of the serial code with gcc vectorization error

Vectorization Analysis - I A part of the serial code with gcc vectorization error output

Vectorization Analysis - II A part of the Open. CL code with vectorized instruction

Vectorization Analysis - II A part of the Open. CL code with vectorized instruction set for the loop-construct in the last slide