Programming with CUDA and Parallel Algorithms Waqar Saleem

  • Slides: 30
Download presentation
Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Programming with CUDA, Waqar

Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Programming with CUDA, Waqar Saleem, Jens

Organization • People • • • Waqar Saleem, waqar. saleem@uni-jena. de Jens Mueller, jkm@informatik.

Organization • People • • • Waqar Saleem, waqar. saleem@uni-jena. de Jens Mueller, jkm@informatik. uni-jena. de Room 3335, Ernst-Abbe-Platz 2 The course will be conducted in English 6 points • • Programming with CUDA, Wahl/Wahlpflicht Theoretical/Practical Waqar Saleem, Jens

Organization • Meetings, before winter break • • • Programming with CUDA, Tue 12

Organization • Meetings, before winter break • • • Programming with CUDA, Tue 12 -14, CZ 129 Thu 16 -18, CZ 129 • • Every second week Starting next week Exercises: Wed 8 -10, CZ 125 • Starting tomorrow in the pool Waqar Saleem, Jens

The course • 2 parts • Before winter break: Lectures and assignments • •

The course • 2 parts • Before winter break: Lectures and assignments • • After the break: Group projects • • • Programming with CUDA, Need at least 50% in assignments to qualify for. . . Project chosen by or assigned to each group Regular meetings Presentation of each project on semester end Waqar Saleem, Jens

Assignments • Build up a minimal ray tracer on GPU • • • Implement

Assignments • Build up a minimal ray tracer on GPU • • • Implement basic ray tracer on CPU Port to GPU Make ray tracer more interesting/efficient Utilize CUDA concepts Basic framework will be provided • • Programming with CUDA, Scene format and scenes Introduction to ray tracing concepts Waqar Saleem, Jens

Requirements • Strong background in C programming • Familiarity with your OS • Modifying

Requirements • Strong background in C programming • Familiarity with your OS • Modifying default settings • Writing/understanding Makefiles • Compiler flags and options Programming with CUDA, Waqar Saleem, Jens

Course content • Parallel programming models and platforms • GPGPU • GPGPU on NVIDIA

Course content • Parallel programming models and platforms • GPGPU • GPGPU on NVIDIA cards: CUDA • Architecture and programming model • Open. CL Programming with CUDA, Waqar Saleem, Jens

Today • Organization • Brief introduction to parallel programming and CUDA • Short introduction

Today • Organization • Brief introduction to parallel programming and CUDA • Short introduction to Ray tracing Programming with CUDA, Waqar Saleem, Jens

Growth of Compute Capability • Moore’s law: the number of transistors that can be

Growth of Compute Capability • Moore’s law: the number of transistors that can be placed. . . on an integrated circuit [doubles] approximately every two years source: wikipedia Programming with CUDA, Waqar Saleem, Jens

 • Moore’s law Growth of Compute Capability source: wikipedia Programming with CUDA, Waqar

• Moore’s law Growth of Compute Capability source: wikipedia Programming with CUDA, Waqar Saleem, Jens

Need for increasing compute capability • Problems are getting more complex • e. g.

Need for increasing compute capability • Problems are getting more complex • e. g. Text editing to Image editing to Video editing • Current hardware complexity is never enough • Impractical to stop development at current state of the art Programming with CUDA, Waqar Saleem, Jens

Barriers to growth • Natural limit on transistor size: the size of an atom

Barriers to growth • Natural limit on transistor size: the size of an atom • More transistors per unit area lead to higher power consumption and heat dissipation Programming with CUDA, Waqar Saleem, Jens

Solution: Parallel architectures Programming with CUDA, Waqar Saleem, Jens

Solution: Parallel architectures Programming with CUDA, Waqar Saleem, Jens

Parallel architectures • Multiple Instructions Multiple Data (MIMD) • multi-threaded, multi-core architectures, clusters, grids

Parallel architectures • Multiple Instructions Multiple Data (MIMD) • multi-threaded, multi-core architectures, clusters, grids • Single Instruction Multiple Data (SIMD) • • Cell processor, GPUs, clusters, grids GPU: Graphics Processing Unit • Parallel programming allows to program for parallel architectures Programming with CUDA, Waqar Saleem, Jens

 • Simplerarchitecture than MIMD GPU • Little overhead for instruction scheduling, branch prediction

• Simplerarchitecture than MIMD GPU • Little overhead for instruction scheduling, branch prediction etc. Subsequent figures from NVIDIA CUDA Programming Guide 2. 3. 1 unless mentioned otherwise Programming with CUDA, Waqar Saleem, Jens

GPU architecture • Simpler architecture leads to higher performance (compared to CPUs) Programming with

GPU architecture • Simpler architecture leads to higher performance (compared to CPUs) Programming with CUDA, Waqar Saleem, Jens

General Purpose computing on GPU, GPGPU • Attractive because of raw GPU power •

General Purpose computing on GPU, GPGPU • Attractive because of raw GPU power • Traditionally hard because GPU programming was closely associated to graphics • Simplicity of GPU architecture limits the kind of problems suitable for GPGPU • Programming with CUDA, or at least requires some problems to be reformulated Waqar Saleem, Jens

GPGPU for the masses* • Freeing the GPU from graphics: Nvidia CUDA, ATI Stream

GPGPU for the masses* • Freeing the GPU from graphics: Nvidia CUDA, ATI Stream • C-like programming interface to the GPU • * - knowledge of underlying architecture required to achieve peak performance Programming with CUDA, Waqar Saleem, Jens

Freeing Parallel Programming • Open. CL: code once, run anywhere • • Programming with

Freeing Parallel Programming • Open. CL: code once, run anywhere • • Programming with CUDA, single core, multi core, GPU, . . . platform details transparent to the user supported by major vendors: Apple, Intel, AMD, Nvidia, . . . Open. CL drivers made available by ATI and Nvidia for their cards Waqar Saleem, Jens

This course • chiefly CUDA: Nvidia specific, mature, well documented, easily available literature •

This course • chiefly CUDA: Nvidia specific, mature, well documented, easily available literature • some Open. CL: open standard, very new, limited documentation available, very similar concepts to CUDA • no ATI Stream Programming with CUDA, Waqar Saleem, Jens

CUDA, Compute Unified Device Architecture • Software: C like programming interface to the GPU

CUDA, Compute Unified Device Architecture • Software: C like programming interface to the GPU • Hardware: the hardware that supports the above programming model Programming with CUDA, Waqar Saleem, Jens

CUDA hardware model Programming with CUDA, Waqar Saleem, Jens

CUDA hardware model Programming with CUDA, Waqar Saleem, Jens

CUDA programming model • CPU=host, GPU=device, work unit=thread Programming with CUDA, Waqar Saleem, Jens

CUDA programming model • CPU=host, GPU=device, work unit=thread Programming with CUDA, Waqar Saleem, Jens

Programming with CUDA, Waqar Saleem, Jens

Programming with CUDA, Waqar Saleem, Jens

Ray tracing • A method to render a given scene • Cast rays from

Ray tracing • A method to render a given scene • Cast rays from a camera into the scene • Compute ray intersections with scene geometry • Render pixel image source: wikipedia Programming with CUDA, Waqar Saleem, Jens

Ray tracer complexity • A ray tracer can be arbitrarily complex • Recursively compute

Ray tracer complexity • A ray tracer can be arbitrarily complex • Recursively compute intersections for reflected, refracted and shadow rays • Account for diffuse lighting • Consider multiple light sources • Consider light sources other than point lights • Account for textures: object materials Programming with CUDA, Waqar Saleem, Jens

Coding a ray tracer • Relatively easy to code on the CPU • Call

Coding a ray tracer • Relatively easy to code on the CPU • Call the same intersection function recursively on secondary rays • CPU code is not so complex • Tricky to code on the GPU as recursion is not yet supported in GPGPU models Programming with CUDA, Waqar Saleem, Jens

This course • Build a trivial ray tracer on the CPU • • compute

This course • Build a trivial ray tracer on the CPU • • compute view rays only part of tomorrow’s exercise • Port to GPU • Add complexity to your GPU ray tracer Programming with CUDA, Waqar Saleem, Jens

Reminders • Exercise session tomorrow • Register on CAJ Programming with CUDA, Waqar Saleem,

Reminders • Exercise session tomorrow • Register on CAJ Programming with CUDA, Waqar Saleem, Jens

See you next time! Programming with CUDA, Waqar Saleem, Jens

See you next time! Programming with CUDA, Waqar Saleem, Jens