TEXTURE MEMORY IN IN CUDA PERSPECTIVE Texture Memory

TEXTURE MEMORY • Read only memory used by programs in CUDA • Used in

WHY USE TEXTURES? • Can cache non consecutive memory locations unlike CPU caching

• Texture memory is cached on a chip. • Provides higher effective bandwidth.

PARALLELIZING PHYSICAL SIMULATIONS • Results are more accurate with reduced computational complexity and lesser

HEAT SIMULATION EXAMPLE • A rectangular room consisting of a grid. • Inside the

FLOW OF HEAT Warmer cells tend to cool as the heat is dissipated

AS A FUNCTION OF HEAT LOSS/GAIN • Imagine that there are 4 neighbors for

THREE STEPS TO COMPUTE TEMPERATURE UPDATES Ø copy_const_kernel() ØCopy Heater temperatures to respective grids

copy_const_kernel() ØConvert thread. Idx and block. Idx into an x and y coordinate.

blend_kernel() • 1 thread for every cell. • Offsets of the neighbors in all

anim_kernel() Ø We use Data. Block contains the constant buffer of heaters and the

anim_kernel() Ø After the iteration we swap the input and output buffers to obtain

USING TEXTURES • Declare inputs as texture references. • Use references to floating point

cuda. Bind. Texture() • Use specified buffer as a texture and texture reference as

tex 1 Dfetch() • A Compiler intrinsic function. • Used to pass tex. In,

USING 2 D-TEXTURES • Reference Declaration: • Instead of using offset to calculate left,

USING 2 -D TEXTURES • Bounds overflow over the grid is taken care of.

Tradeoffs 1 D vs 2 D • So from a performance standpoint, the decision

REFERENCES • http: //http. developer. nvidia. com/Cg/tex 1 Dfetch. html • http: //developer. download.

Slides: 27

Download presentation

TEXTURE MEMORY IN - IN CUDA PERSPECTIVE Texture Memory -in CUDA Perspective VINAY MANCHIRAJU

TEXTURE MEMORY • Read only memory used by programs in CUDA • Used in General Purpose Computing for Accuracy and Efficiency. • Designed for Direct. X and Open. GL rendering Pipelines.

WHY USE TEXTURES? • Can cache non consecutive memory locations unlike CPU caching schemes. • Designed to accelerate access patterns.

• Texture memory is cached on a chip. • Provides higher effective bandwidth. • Reduces memory requests to the off-chip DRAM. • Improves performance of graphics application where memory access patterns exhibit great deal of spatial locality.

PARALLELIZING PHYSICAL SIMULATIONS • Results are more accurate with reduced computational complexity and lesser time to solve. • Textures have a significant role in simulation problems.

HEAT SIMULATION EXAMPLE • A rectangular room consisting of a grid. • Inside the grid various heaters with fixed temperatures are scattered in the cell.

FLOW OF HEAT Warmer cells tend to cool as the heat is dissipated to cooler regions and vice versa

AS A FUNCTION OF HEAT LOSS/GAIN • Imagine that there are 4 neighbors for a given cell. • K -> Rate of heat flow from one cell to another. • A large value of k will drive the system to a constant temperature quickly, while a small value will allow the solution to retain large temperature gradients longer.

THREE STEPS TO COMPUTE TEMPERATURE UPDATES Ø copy_const_kernel() ØCopy Heater temperatures to respective grids ØEnforce a restriction that temperatures of the cells with heaters are constant. Ø blend_kernel(): Ø Output temperatures are calculated based on the input temperatures of the grid using the equation. Ø Swap the input and output temperatures for the calculation in next step.

copy_const_kernel() ØConvert thread. Idx and block. Idx into an x and y coordinate. ØCompute a linear offset into constant and input buffers. ØIf the cell in the constant grid is nonzero copy of the heater temperature in cptr[] to the input grid in iptr[].

blend_kernel() • 1 thread for every cell. • Offsets of the neighbors in all the 4 directions are computed to read the temperatures of those cells. • Each threads its cell’s temperature, the temperatures of its neighboring cells, perform the previous update computation, and then update its temperature with the new value. • Calculate updated temperature adding old temperatures and scaled differences and the neighboring cell temperatures.

anim_kernel() Ø We use Data. Block contains the constant buffer of heaters and the updated temperatures. Ø Arguments: pointer to a data block, number of ticks of animation that have elapsed. (not used) Ø We use a 16 x 16 grid and blocks of 256 threads.

anim_kernel() Ø After the iteration we swap the input and output buffers to obtain the final temperatures. Ø The temperatures are converted into colors and the bitmap image is transferred from GPU to CPU. Ø The Program.

USING TEXTURES • Declare inputs as texture references. • Use references to floating point textures. • Allocate GPU memory for these textures and then bind the references using cuda. Bind. Texture()

cuda. Bind. Texture() • Use specified buffer as a texture and texture reference as texture name. • Please check cuda. Bind. Texture()

tex 1 Dfetch() • A Compiler intrinsic function. • Used to pass tex. In, tex. Out, tex. Const. Src textures to the blend method. • This would help us to fetch the texture value into a float point variable.

copy_const_kernel()

cuda. Unbind. Texture()

USING 2 D-TEXTURES • Reference Declaration: • Instead of using offset to calculate left, right, top and bottom we directly use x, y to access the texture.

USING 2 -D TEXTURES • Bounds overflow over the grid is taken care of. • If one of x or y is less than zero, tex 2 D() will return the value at zero. • If one of these values is greater than the width, tex 2 D() will return the value at width 1.

tex 2 D

Cuda. Bind. Texture 2 d()

Tradeoffs 1 D vs 2 D • So from a performance standpoint, the decision between one- and two-dimensional textures is likely to be inconsequential. • For our particular application, the code is a little simpler when using two- dimensional textures because we happen to be simulating a twodimensional domain. But in general, since this is not always the case, we suggest you make the decision between one- and two-dimensional textures on a case-by-case basis.

REFERENCES • http: //http. developer. nvidia. com/Cg/tex 1 Dfetch. html • http: //developer. download. nvidia. com/compute/cuda/4_1/rel/toolkit/do cs/online/group__CUDART__HIGHLEVEL_g 2 aeb 95 eab 6 b 9 d 90 bb 00 b 26406 a 27 c 515. html • http: //developer. download. nvidia. com/compute/cuda/4_1/rel/toolkit/do cs/online/group__CUDART__HIGHLEVEL_g 67660 ae 3 e 9 a 1 ff 520575394 f 78 087 bea. html#g 67660 ae 3 e 9 a 1 ff 520575394 f 78087 bea

THANK YOU…