Cuda-gdb - dynamic parallelism support

Danialll · June 21, 2016, 2:23am

How can cuda-gdb support dynamic parallelism?
My problem is that I am trying to figure out cuda-memcheck error in a dynamic parallel cuda code. The error happens in the kernel (second kernel) which is launched from another kernel(first kernel) and both cuda-memcheck and cuda-gdb do not provide information about the first kernel launch. They just simply print the line number in the first kernel launch where second kernel is launched as the error point location which is not accurate.

Is there any way to get some more accurate information?

wijesinghe.2 · September 2, 2018, 1:56am

I have the same question.

Robert_Crovella · September 2, 2018, 2:38am

cuda-gdb claims to fully support CUDA dynamic parallelism.

You might want to read the manual, and note all references to “dynamic”:

https://docs.nvidia.com/cuda/cuda-gdb/index.html

When I perform a fairly simple test, I can set breakpoints, single-step, and inspect variables in a child kernel, so I’m not sure what the problem is.

It is true that you cannot step from a parent kernel directly into a child kernel launch. But you can set breakpoints in the child kernel, and begin stepping and inspecting variables there.

Here’s a simple fully worked example:

$ cat t291.cu
#include <stdio.h>

__global__ void child(int *data){

  int val = *data;
  val *= 2;
  *data = val;
}

__global__ void parent(int *data){

  child<<<1,1>>>(data);
  cudaDeviceSynchronize();
}

int main(){

  int *d_data, h_data = 2;
  cudaMalloc(&d_data, sizeof(int));
  cudaMemcpy(d_data, &h_data, sizeof(int), cudaMemcpyHostToDevice);
  parent<<<1,1>>>(d_data);
  cudaMemcpy(&h_data, d_data, sizeof(int), cudaMemcpyDeviceToHost);
}
$ nvcc -arch=sm_35 -o t291 t291.cu -g -G -rdc=true -lcudadevrt
$ cuda-gdb ./t291
NVIDIA (R) CUDA Debugger
9.2 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./t291...done.
(cuda-gdb) break t291.cu:5
Breakpoint 1 at 0x4033d1: file t291.cu, line 5.
(cuda-gdb) run
Starting program: /home/user2/misc/t291
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffefa08700 (LWP 6418)]
[New Thread 0x7fffef207700 (LWP 6419)]
[Switching focus to CUDA kernel 1, grid -5, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 1, lane 0]

Thread 1 "t291" hit Breakpoint 1, child<<<(1,1,1),(1,1,1)>>> (data=0x7fffbb600000) at t291.cu:5
5         int val = *data;
(cuda-gdb) print val
$1 = <optimized out>
(cuda-gdb) step
6         val *= 2;
(cuda-gdb) print val
$2 = 2
(cuda-gdb) step
7         *data = val;
(cuda-gdb) print val
$3 = 4
(cuda-gdb) continue
Continuing.
[Thread 0x7fffef207700 (LWP 6419) exited]
[Thread 0x7ffff7fcc740 (LWP 6393) exited]
[Inferior 1 (process 6393) exited normally]
(cuda-gdb) quit
$

Note the negative ID on the grid:

[Switching focus to CUDA kernel 1, grid -5, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 1, lane 0]
                                   ^^^^^^^

indicates the kernel where the breakpoint was hit is from a device-side kernel launch.

wijesinghe.2 · September 2, 2018, 6:24am

Hey txbob, thanks for the reply. Apparently the problem is not in cuda-gdb itself, but my child kernel does not launch when compiled with --device-debug (a printf works only when compiled without --device-debug). I checked the ptx, and it looks like the function call is there. Does something like max block size for child kernels somehow change with --device-debug?

Robert_Crovella · September 2, 2018, 6:31am

no but any number of other code characteristics can, such as register usage, which could effectively limit max block size.

It sounds like your issue has nothing at all to do with cuda-gdb or what it is capable of

If you are using rigorous CUDA error checking both in your host code as well as your device code (you use it in device code exactly the same way you would use it in host code) it should spell out why your child kernels are not launching.

Topic		Replies	Views
Cuda-gdb crash when trying to debug kernel launched through `cudaLaunchCooperativeKernel` CUDA-GDB cuda-gdb	11	2388	April 29, 2024
CUDA-GDB captured "Illegal access to address" exception when I invoke child kernel, but the result is correct when free run CUDA Programming and Performance	6	1717	March 20, 2017
cuda-gdb cannot break in device code CUDA Programming and Performance	2	1853	April 12, 2011
Problem with arch=sm_20 CUDA Programming and Performance	16	4231	March 4, 2011
cuda-gdb won't print local variables in kernel function CUDA Programming and Performance	12	17687	July 7, 2011
Dynamic parallelism, Kernel didn't launch CUDA Programming and Performance	6	1861	September 12, 2016
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53717	August 1, 2011
When using cuda-gdb for debugging in CLion, setting a breakpoint within a kernel function causes CLion to freeze CUDA-GDB	2	349	July 30, 2024
cuda-gdb don't stop the program CUDA-GDB	1	1485	January 24, 2020
Correct output with emulation mode, wrong with GPU/Execution CUDA Programming and Performance	6	3324	March 25, 2010

Cuda-gdb - dynamic parallelism support

Related topics