cuda-gdb cannot break in device code

I am having some trouble getting cuda-gdb to work properly.

I have a project consisting of several files which I compile using the -g -G flag combination in a single step. When I run this program in cuda-gdb, I am unable to set break points on my device functions. They are invisible to the tab-completion feature and the program will not break if I set the break points to the relevant line numbers of the .cu file. I am able to set a break point on the global kernel function, but the debugger does not actually break there.

I have used cuda-gdb on other projects without this problem, but they were less complicated. So, I wonder if I have inadvertently built some problem into this current project that is causing this behaviour. What should I look for?

A particularly unsettling phenomenon is that when I compile the code with -g -G, the program runs in a fraction of the time and then gives the wrong answer! So, it appears the debugging flags are causing the program to do an entirely different calculation. What might be the cause of this?

I have seen this thread, but none of the advice in it has helped me:

Thanks in advance!

I don’t know if this would hlep or not:

I have seen this once. When I use texture fetch, I cannot get into the device code by cuda-gdb. It maybe that the feature is not supported (correct me if I am wrong).

The other symptom seems indicating a bug.

Hi huys,

I 've also a strange bug with cuda-gdb on my gpu cluster based on Tesla S1070 nodes.

We have cuda 3.2 installed (i think we have this problem since cuda 3.0) with Linux 64 bit driver 260.19.21

The OS is an "Red Hat Enterprise Linux Server release 5.3 " slightly modified by the cluster vendor.

We can’t break in cuda kernels.

After reading this post i tried a simple reproducer with all advices there were here but it still failed:

I compile my code in one step

nvcc -G -g -o matmul_debug

At execution, i try to set the focus without success and cuda-gdb finally crash and dump a core when trying to step in kernel

Program exited normally.

(cuda-gdb) run

Starting program: ./matmul_debug 

[Thread debugging using libthread_db enabled]

[New process 23054]

Matrice réelle NxN: 1.05 Mo

[New Thread 47728650801024 (LWP 23054)]

[Switching to Thread 47728650801024 (LWP 23054)]

Breakpoint 1, kernel_mulmat (__cuda_0=0x100000, __cuda_1=0x200000, __cuda_2=0x300000, __cuda_3=512) at

6       __global__ void kernel_mulmat(real *A, real *B,real *C, int n){

(cuda-gdb) info cuda device

Focus not set on any running CUDA kernel.

(cuda-gdb) cuda device 0

No CUDA kernel is currently running.

(cuda-gdb) cuda device 1

No CUDA kernel is currently running.

(cuda-gdb) info cuda kernels

No active kernel on CUDA devices.

(cuda-gdb) step

Breakpoint 1, kernel_mulmat (__cuda_0=0x100000, __cuda_1=0x200000, __cuda_2=0x300000, __cuda_3=512) at

6       __global__ void kernel_mulmat(real *A, real *B,real *C, int n){

(cuda-gdb) info cuda threads

Focus not set on any running CUDA kernel.

(cuda-gdb) step

Breakpoint 1, kernel_mulmat (__cuda_0=0x100000, __cuda_1=0x200000, __cuda_2=0x300000, __cuda_3=512) at

6       __global__ void kernel_mulmat(real *A, real *B,real *C, int n){

(cuda-gdb) step

__device_stub__Z13kernel_mulmatPfS_S_i (__par0=0x100000, __par1=0x200000, __par2=0x300000, __par3=512) at /tmp/tmpxft_00005985_00000000-1_matmul.cudafe1.stub.c:6

6       /tmp/tmpxft_00005985_00000000-1_matmul.cudafe1.stub.c: No such file or directory.

        in /tmp/tmpxft_00005985_00000000-1_matmul.cudafe1.stub.c

(cuda-gdb) step

7       in /tmp/tmpxft_00005985_00000000-1_matmul.cudafe1.stub.c

(cuda-gdb) step

cudaLaunch<char> (

    entry=0x401062 "UH\211�\203�H\211}�\211u�\211U�211M�213M�\213U�\213u�\213}�\031����\220UH\211�\203�020�017\021E�\213E�\211E�\017\020E�\t��f\017(�\017\020\005�s") at /applications/cuda-3.2/bin/../include/cuda_runtime.h:935

935       return cudaLaunch((const char*)entry);

(cuda-gdb) step

BACKTRACE (9 frames):










If i put the breakpoint in a line inside the kernel, it also crashes.

The strange thing is the same code, with the same cuda version, the same driver works on a machine with a CentOS 5.2 and a Quadro FX5800

We are waiting for Cuda 4 final release to see if it has changed…