Cuda Compile Run and Debug First run the
Cuda: Compile, Run, and Debug
First, run the script to set up environment variables. Navigate to Labs Run the script with source setup_cuda. sh Use echo to confirm that environment variables are set
Compile and Run Programmes › Navigate to Labs/original/lab 0 › › Compile cuda_test. cu with nvcc command › Run with. / like a normal c file
Compile and Run (For debugger) › Compile your code with the –g and –G flags [username@scc-hd 1 lab 7]$ nvcc -g -G -o cuda_test. cu › Use the command cuda-gdb <executable> to run the debugger [username@scc-hd 1]$ cuda-gdb cuda_test NVIDIA (R) CUDA Debugger 8. 0 release Portions Copyright (C) 2007 -2016 NVIDIA Corporation GNU gdb (GDB) 7. 6. 2 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv 3+: GNU GPL version 3 or later <http: //gnu. org/licenses/gpl. html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x 86_64 -unknown-linux-gnu". For bug reporting instructions, please see: <http: //www. gnu. org/software/gdb/bugs/>. . . Reading symbols from <filepath>/EC 527/lab 7/cuda_test. . . done. (cuda-gdb)
Add Breakpoints › Add a breakpoint at the start of kernel_add kernel call (cuda-gdb)$ break kernel_add Breakpoint 1 at 0 x 40329 b: file cuda_test. cu, line 26. (cuda-gdb)$ › Note: › Breakpoints can be added to function names, individual lines, addresses etc. They can be conditional. For details, see the link below Source: http: //docs. nvidia. com/cuda-gdb/#breakpoints
Run Program › Start code execution using the run command › Program will run till the breakpoint is reached (cuda-gdb) run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib 64/libthread_db. so. 1". Length of the array = 50000 [New Thread 0 x 2 aaaacc 72700 (LWP 3287)] [New Thread 0 x 2 aaaace 9 b 700 (LWP 3288)] Initializing the arrays. . . done [Switching focus to CUDA kernel 0, grid 1, block (0, 0, 0), thread (0, 0, 0), device 0, sm 0, warp 0, lane 0] Breakpoint 1, kernel_add<<<(16, 1, 1), (256, 1, 1)>>> (arr. Len=50000, x=0 xf 02 d 00000, y=0 xf 02 d 30 e 00, result=0 xf 02 d 61 c 00) at cuda_test. cu: 27 27 const int tid = IMUL(block. Dim. x, block. Idx. x) + thread. Idx. x; (cuda-gdb) › The debugger will access Block = 0, Thread = 0 by default › Debugger focus can be changed (later slide)
Step Through Program › Step through the kernel using the next command (cuda-gdb) 28 (cuda-gdb) 32 (cuda-gdb) 33 (cuda-gdb) 32 (cuda-gdb) next const int thread. N = IMUL(block. Dim. x, grid. Dim. x); next for(i = tid; i < arr. Len; i += thread. N) { next result[i] = (1 e-6 * x[i] ) + (1 e-7 * y[i]) + 0. 25; next for(i = tid; i < arr. Len; i += thread. N) { › Return control back to system and skip to the end of execution using the continue command (cuda-gdb) continue Continuing. GPU time: 338853. 750000 (msec) TEST PASSED: All results matched [Thread 0 x 2 aaaabe 2 d 7 a 0 (LWP 3278) exited] [Thread 0 x 2 aaaacc 72700 (LWP 3287) exited] [Inferior 1 (process 3278) exited normally] (cuda-gdb)
Print Local and Global Variables › View variable values using the print command › Cannot print local variables of a different thread (cuda-gdb) next 28 const int thread. N = IMUL(block. Dim. x, grid. Dim. x); (cuda-gdb) next 32 for(i = tid; i < arr. Len; i += thread. N) { (cuda-gdb) next 33 result[i] = (1 e-6 * x[i] ) + (1 e-7 * y[i]) + 0. 25; (cuda-gdb) next 32 for(i = tid; i < arr. Len; i += thread. N) { (cuda-gdb) print block. Dim. x $1 = 256 (cuda-gdb) print i $2 = 0 (cuda-gdb) print result[i] $3 = 1420. 2738 (cuda-gdb) print x[1] $4 = 547869760 (cuda-gdb) print (1 e-6 * x[20] ) + (1 e-7 * y[20]) + 0. 25 $5 = 82. 079659199999981 (cuda-gdb) next 33 result[i] = (1 e-6 * x[i] ) + (1 e-7 * y[i]) + 0. 25; (cuda-gdb) print i $6 = 4096
Switch Debugger Focus › Switch the thread/block in focus using the cuda thread or cuda block command (cuda-gdb) cuda block 3 [Switching focus to CUDA kernel 0, grid 1, block (3, 0, 0), thread (0, 0, 0), device 0, sm 12, warp 0, lane 0] 27 const int tid = IMUL(block. Dim. x, block. Idx. x) + thread. Idx. x; (cuda-gdb) cuda thread 10 [Switching focus to CUDA kernel 0, grid 1, block (3, 0, 0), thread (10, 0, 0), device 0, sm 12, warp 0, lane 10] 27 const int tid = IMUL(block. Dim. x, block. Idx. x) + thread. Idx. x; (cuda-gdb) next 28 const int thread. N = IMUL(block. Dim. x, grid. Dim. x); (cuda-gdb) print tid $7 = 778 (cuda-gdb)
Quit › Terminate the debugging session using the quit command (cuda-gdb) quit A debugging session is active. Inferior 1 [process 3449] will be killed. Quit anyway? (y or n) y [username@scc-hd 1 lab 7]$
- Slides: 10