Attaching cuda-gdb to running process doesn't allow kernel debug?

I’m writing multiple GPU code, which is powered by a Fortran MPI codebase (which calls C code that runs the kernels), with 2 MPI processes per node, 2 cards per node (i.e., 1 processor per GPU). Cuda-gdb works fine if I’m running it normally (not attaching). I’ve tried both cuda-3.2 and the newest cuda-4.0 release. I’m running on a C2070 or something similar with no x11 running.

I would like to use cuda-gdb to debug kernels, but the one of the only ways to do this, is to have the code “pause” before the kernel in question due to running this code via MPI

// put program to sleep until I attach cuda-gdb

if(1) {     

int i = 0;

char hostname[256];

gethostname(hostname, sizeof(hostname));

printf("PID %d on %s ready for attach\n", getpid(), hostname);


while (0 == i) sleep(5);


This outputs somehting like

PID 2737 on $MACHINE ready for attach

From there, I can start up cuda-gdb and attach to the process and then move the stack frame up 2 and then change the variable so that the while loop will stop sleeping.

(cuda-gdb) attach 2732

(cuda-gdb) up 2

(cuda-gdb) set var i = 7

(cuda-gdb) break (breakpoint inside kernel)

(cuda-gdb) continue

If I had started my code from cuda-gdb directly, this breaks just fine inside the kernel on line 12 and everything is great. However, after attaching, it just stops after the kernel and nothing happens (or at least I don’t get to debug my kernel). I’ve tried typing ‘info cuda kernels’, but it says there are no kernels, etc.

Anyone have experience with this?

My test code for reference:

I compiled with: nvcc -g -G -o go

cuda-gdb cannot attach to a running CUDA program yet. However, it is a feature that’s being developed actively, and you should see it soon in a future release (I’m on the cuda-gdb team).

Well that’s good news, both that I’m not an idiot for not getting it to work, and that it is something that will be out soon!