CPU GPU ALU ALU Control Cache DRAM CUDA

내부 구조 비교 CPU GPU ALU ALU Control Cache DRAM

CUDA를 지원하는 전형적인 GPU 구조 - SM(streaming multiprocessor) 배열로 구성 - 각 SM은 다수의

CUDA 프로그램 실행 모델 Serial Code (host) Parallel Kernel (device) Kernel. A<<< n. Blk,

CUDA 프로그램 구조의 예(행렬곱셈) void Matrix. Multiplication(float* M, float* N, float* P, int Width)

Slides: 11

Download presentation

내부 구조 비교 CPU GPU ALU ALU Control Cache DRAM

CUDA를 지원하는 전형적인 GPU 구조 - SM(streaming multiprocessor) 배열로 구성 - 각 SM은 다수의 streaming processor들로 구성 - Global memory: GDDR DRAM [예] G 80: 128개의 SP(16 SM, 8 SP/SM) Host Input Assembler Thread Execution Manager Parallel Data Cache Parallel Data Cache Texture Texture Texture Load/store Global Memory 6 Load/store

CUDA 프로그램 실행 모델 Serial Code (host) Parallel Kernel (device) Kernel. A<<< n. Blk, n. Tid >>>(args); . . . Serial Code (host) Parallel Kernel (device) Kernel. B<<< n. Blk, n. Tid >>>(args); . . .

CUDA 프로그램 구조의 예(행렬곱셈) void Matrix. Multiplication(float* M, float* N, float* P, int Width) { int size = Width * sizeof(float); float* Md, Nd, Pd; … // Allocate device memory for M, N and P // and copy M and N to allocated device memory location // Kernel invocation code to let the device perform the actual multiplication … // Read P from the device // Free device matrices