CUDA Overview A Fast Introduction CUDA Overview Joo
- Slides: 65
CUDA Overview: A Fast Introduction CUDA Overview João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua
� GPUs What is Cuda? Where to Download? How to Install Architecture Performance Visual Studio Integration Examples How to Learn more about. CUDA? Study Plan References Discussion CUDA Overview: A Fast Introduction Topics
“. . . Explain The Basics Of CUDA. . . ” CUDA Overview: A Fast Introduction Goal
Compute Unified Device Architecture CUDA is the computing engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages CUDA Overview: A Fast Introduction What is CUDA?
CUDA Overview: A Fast Introduction CUDA Performance
� Specific CUDA Overview: A Fast Introduction CPU Scenario Code Ex: Population 1024 Soldiers soldier. Score(x) Fitness Function 12387 Unit Points Soldier[i] soldier. Score(soldier[i]) Soldier[0. . . 1023] (1024/1) *time(soldier. Score())
� Specific CUDA Overview: A Fast Introduction GPU Scenario Code Ex: Population 1024 Soldiers soldier. Score(x) Fitness Function Ge. Force XXXX++ 256 processors Soldier[i]. . . Soldier[i+n] soldier. Score(soldier[i]) 12387. . . 12494. . . 15912 Unit Points Soldier[0. . . 1023] (1024/256) *time(soldier. Score())
CUDA Overview: A Fast Introduction What do I need to run CUDA?
CUDA Overview: A Fast Introduction Where to Download CUDA ?
CUDA Overview: A Fast Introduction What to Download ?
5% Faster? 20% Faster? 300% Faster? 900% Faster? CUDA Overview: A Fast Introduction Does it Worth?
Low Cost, Supercomputing for the Masses CUDA Overview: A Fast Introduction Unified Architecture - CUDA
1 Year 3 Days 1 Day 15 Minutes 2 Minutes 1. 2 Seconds 100 x CUDA Overview: A Fast Introduction Does it Worth? Speedups
CUDA Overview: A Fast Introduction Unified Architecture - CUDA
Low Cost, Supercomputing for the Masses CUDA Overview: A Fast Introduction Unified Architecture - CUDA
1. 000 Bodies CUDA Overview: A Fast Introduction Example: Crowd Simulation
� CPUs vs GPUs CUDA Overview: A Fast Introduction Architecture
Fixed Function GPUs Programmable GPUs Unified Architecture CUDA Overview: A Fast Introduction GPU – The Evolution
Fixed Function GPUs • Not Programmable Architecture • No Acess to the Processor • Only APIs CUDA Overview: A Fast Introduction GPU – The Evolution
Programmable GPUs • Architecture Oriented to Computer Graphics CUDA Overview: A Fast Introduction GPU – The Evolution
CUDA Overview: A Fast Introduction Unified Architecture - CUDA
CUDA Overview: A Fast Introduction Getting VS 2008 for Free
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
Command line: $(CUDA_BIN_PATH)nvcc. exe -ccbin "$(VCInstall. Dir)bin" -c D_DEBUG -DWIN 32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc, /W 3, /nologo, /Od, /Zi, /RTC 1, /MDd -I"$(CUDA_INC_PATH)" I. / -o $(Configuration. Name)kernel. obj kernel. cu Outputs: $(Configuration. Name)kernel. obj CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction VS 2008 Integration
CUDA Overview: A Fast Introduction CUDA VS Wizard
CUDA and Linux
CUDA and Linux CUDA Overview: A Fast Introduction
CUDA Overview: A Fast Introduction CUDA and Eclipse
CUDA Overview: A Fast Introduction Software Architecture
Why Programming in Threads? CUDA Overview: A Fast Introduction CUDA and Threads
How many threads have you Ever created? CUDA Allow thousands and Thousands of threads = Cluster of Threads CUDA Overview: A Fast Introduction CUDA and Threads
CPU GPU Few Threads If we Need Thounsads Threads 1000 inst. to change Threads, it’s 1000 inst It’s NOT ok. CUDA Overview: A Fast Introduction Threads – Management Costs
Must be Explicit “…synchronization is accomplished using the function syncthreads, syncthreads which acts as a barrier or memory fence…” CUDA Overview: A Fast Introduction Cuda - Synchronization
Cuda extends the C Language through the kernels *. cu – CUDA Files Each Kernel is a function that will be executed N times on the device CUDA Overview: A Fast Introduction Cuda – Important Definitions
CUDA Overview: A Fast Introduction Conventions
Functions in CUDA Executed Combinations are also Possible No recursion at the device (GPU) No static variables cuda. Malloc() cuda. Free() Called
CUDA Overview: A Fast Introduction CUDA and Limits of Bandwidth of Memory Reuse your Data!
� Hide Implementation Details � HW Evolution CUDA Overview: A Fast Introduction Architecture
Threads, Blocks and Grids One Kernel One Grid Each Block Many Threads All Threads inside a block share the same memory area Threads in different blocks do not share memory their local memory among them Threads in different blocks cannot cooperate
Threads, Blocks and Grids Each Block up to 512 threads
Variables Type Spec. grid. Dim 3 Grid Dimension block. Idx Uint 3 Index of the block in the grid block. Dim 3 Dimension of the block thread. Idx Uint 3 Index of the thread in the block __ global__ void Kernel. Function (. . . ) dim 3 Dim. Grid (100, 10); // Grid 1000 Blocks dim 3 Dim. Block (4, 8, 8); // Each block has 256 threads Size_t Shared. Mem. Bytes = 32 Kernel. Fun << Dim. Grid, Dim. Block, Shared. Mem. Bytes>> (. . . ); CUDA Overview: A Fast Introduction Threads, Blocks and Grids
// Kernel definition __global__ void vec. Add(float* A, float* B, float* C){. . . } int main(){ // Kernel invocation vec. Add<<<1, N>>>(A, B, C); } __global defines that it’s a kernel… Called on The Host Executed on The Device CUDA Overview: A Fast Introduction Some code. . .
Some code. . .
CUDA Overview: A Fast Introduction Some code. . .
Calc. Score <<< blocks , threads. Per. Block >>> (score. Sol. D, v. Sol. D , mat. D , dim); 1|2|3|4|5|6|3|1|2|4|5|6|2|6|5|1|3|4|5|6|2|4|3|1|2|6|3|1|5|4 __global__ void calc. Score(float * score , int * sol , float * mat , int dim){ //ID da Thread em X int idx = block. Idx. x * block. Dim. x + thread. Idx. x; //Calc the initial position where the Threads is going to work int pos = (idx * dim); int temp; score[idx] = 0; //Vector part where thread is going to work for( int i = pos ; i < (pos+dim) - 1 ; ++i){ temp = sol[i] * dim + sol[i+1]; score[idx] += mat[temp]; } //The Last to the first temp = sol[pos+dim-1] * dim + sol[pos]; score[idx] += mat[temp]; } GPUs and the Travelling Salesman Problem Kernel TSP Score
Sum two Matrixes. . . CUDA Overview: A Fast Introduction Suggested Exercise. . .
- CUDA does not generate Random Numbers - CUDA has no sorting methods - In Cuda everything is vectors (arrays)
GPUs will Probably Disappear. . . CUDA Overview: A Fast Introduction Trends. . .
Scalability
CUDA Overview: A Fast Introduction Nvidia - CUDA Education
CUDA Overview: A Fast Introduction Nvidia - CUDA Education
Nvidia - CUDA Reference
CUDA Overview: A Fast Introduction Learning CUDA - Dr. Dobb’s
Study Plan Study CUDA Reference Provide somehow a CUDA Supported Device or Emulate One Watch Nvidia Cuda Casts Make it Work on Linux and/or Windows Watch Davir Kirk (Illinois Univ. ) Cuda casts Read Dr. Dobb’s Articles
References http: //www. nvidia. com/object/cuda_develop. html Quickstart guide Programming guide reference manual Toolkit release notes SDK release notes windows
Thanks Esteban Clua - http: //www. ic. uff. br/~esteban/ Bruno Cardoso Lopes- http: //www. brunocardoso. cc/ Rodolfo Jardim de Azevedo - http: //www. ic. unicamp. br/~rodolfo CUDA Overview: A Fast Introduction Marcelo Zamith - http: //www. ic. uff. br/~mzamith/
� Download of the Presentation: ◦ www. tinyurl. com/mjpktf Cache Tuning – Global Cyber Bridges Doubts? Comments? Extras?
- Cuda overview
- Acid fast vs non acid fast
- Acid fast vs non acid fast
- Jessica joo
- Si comprehendis non est deus
- Dr joo teoh
- Eduardo joo
- Dr tan huck joo
- Joo yeun chang
- Pyothorax without fistula
- Joo silva gozando
- Hexeditine
- Sigepe legis
- Kinga joo
- Hee joo lee
- I joo
- Stephen joo
- Joo 3:16
- Joo o
- Maria joo
- Dr joo teoh reviews
- Pospia
- Joo e maria
- Maria joo
- Programming
- Joo hee jin
- Lonshi joo
- Vls joo
- Fast food paragraph
- Paragraph fast food
- Introduction to fast fourier transform
- Multicullar
- Introduction product overview
- Introduction product overview
- Introduction product overview
- Introduction product overview
- Cuda texture object
- Cudabindtexture2d
- Syncthreads
- Sedam svetskih cuda
- Intel cuda
- Exclusive scan
- Cuda svd
- Cuda parallel reduction
- What is parallel reduction?
- Cuda code for matrix multiplication
- Alisa u zemlji cuda
- Oit.duke.edu
- Cuda shared memory size
- Cuda programming model
- Cuda programming model
- Cuda programming model
- Cuda math library
- Cuda atomic
- Cuda 101
- Cuda architecture explanation
- Appstrofa
- Likovi alisa u zemlji cuda
- Conclusion of parallel computing
- Cuda matrix multiplication optimization
- Pixels cda
- Hwschedules
- Cuda pinned memory
- Cuda
- Cuda
- Cuda divergence