I’m writing multiple GPU code, which is powered by a Fortran MPI codebase (which calls C code that runs the kernels), with 2 MPI processes per node, 2 cards per node (i.e., 1 processor per GPU). Cuda-gdb works fine if I’m running it normally (not attaching). I’ve tried both cuda-3.2 and the newest cuda-4.0 release. I’m running on a C2070 or something similar with no x11 running.
I would like to use cuda-gdb to debug kernels, but the one of the only ways to do this, is to have the code “pause” before the kernel in question due to running this code via MPI
// put program to sleep until I attach cuda-gdb
if(1) {
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i) sleep(5);
}
This outputs somehting like
PID 2737 on $MACHINE ready for attach
From there, I can start up cuda-gdb and attach to the process and then move the stack frame up 2 and then change the variable so that the while loop will stop sleeping.
(cuda-gdb) attach 2732
(cuda-gdb) up 2
(cuda-gdb) set var i = 7
(cuda-gdb) break test.cu:12 (breakpoint inside kernel)
(cuda-gdb) continue
If I had started my code from cuda-gdb directly, this breaks just fine inside the kernel on line 12 and everything is great. However, after attaching, it just stops after the kernel and nothing happens (or at least I don’t get to debug my kernel). I’ve tried typing ‘info cuda kernels’, but it says there are no kernels, etc.
Anyone have experience with this?
My test code for reference: http://pastebin.com/0cs777iE
I compiled with: nvcc test.cu -g -G -o go