Breakpoint 1 at 0x805725a: file regressionKernel.cu, line 17.
(cuda-gdb) run
Starting program: /home/letharion/cuda/a.out
[Thread debugging using libthread_db enabled]
[New Thread -1211300144 (LWP 12837)]
Program exited normally.
[/codebox]
The compiling instructions is done like instructed on page 5 in the cuda-gdb manual.
When I request a break, my interpretation of the response is that my Kernel has been found. Line 17 is the first line of code inside “KernelName”.
When I then run my program, I would expect the program to break inside the Kernel, but as you can see that doesn’t happen.
I can also break at main, and next my way to the Kernel invocation. From there I try to step my way inside, but fail because I’m only shown two lines of dim3() and then execution proceeds after the kernel.
Do you have a GPU with debugging capabilities? However, I think that if you don’t then you get a message that says something like: “GPU with no debugging capabiliteis,” but I am just checking.
“Breakpoint 1 at 0x805725a: file regressionKernel.cu, line 17” implies that you’re compiling things at different times. This doesn’t work–all of your CUDA files need to be compiled together in a single step in order for the debugger to work.
I do get the warning below which I override, Xwindows has been stopped via “/etc/init.d/gdm stop” and I’m connecting via NX remote screen.
ERROR: Debugging a GPU attached to a Window system is not supported and may hang the system.
We have detected that DISPLAY is set which indicates a window system is running.
To override this detection, set the environment variable CUDA_GDB_DISPLAY_OVERRIDE to any value.
After much trial and error, it was the ‘if’ statement in the kernel. I remember reading something about di/convergence of conditionals but does that mean you can’t debug a kernel (on device, not emu) with conditionals?
Just wondering was this issue ever resolved properly?
I had this same problem with nvcc from “Cuda compilation tools, release 3.1, V0.2.1221” (and a Fermi card), though I never had this problem with nvcc from v2.3 (and Tesla). Using RedHat Linux. No matter how I built the executable or set the breakpoints, cuda-gdb just would not step in to a kernel, but jumped right over it (as if I typed “next” instead of “step”, and as if no breakpoints were set in there). cuda-gdb was happy enough to accept breakpoints based on the kernel name, and on line numbers inside the kernel. There were no “if” statements in the kernel either. It didn’t make much sense anyway that an “if” statement would throw off a debugger so completely.
I solved the problem (for my case at least) by reducing the block-size from 1024 (32x32) to 256 (16x16). That would explain “everything”, only 1024 threads is supposed to be allowed…
Just wondering was this issue ever resolved properly?
I had this same problem with nvcc from “Cuda compilation tools, release 3.1, V0.2.1221” (and a Fermi card), though I never had this problem with nvcc from v2.3 (and Tesla). Using RedHat Linux. No matter how I built the executable or set the breakpoints, cuda-gdb just would not step in to a kernel, but jumped right over it (as if I typed “next” instead of “step”, and as if no breakpoints were set in there). cuda-gdb was happy enough to accept breakpoints based on the kernel name, and on line numbers inside the kernel. There were no “if” statements in the kernel either. It didn’t make much sense anyway that an “if” statement would throw off a debugger so completely.
I solved the problem (for my case at least) by reducing the block-size from 1024 (32x32) to 256 (16x16). That would explain “everything”, only 1024 threads is supposed to be allowed…
I’m suffering from the same problem both on my code as well as simplified kernels. I’ve tried the walkthrough example from the cuda-gdb manual, but debugging kernels just does not work.
When trying to replicate the steps from the manual (page 20), at step 4, when breaking in the kernel I get this output:
(cuda-gdb) c
Continuing.
Breakpoint 2, bitreverse (__cuda_0=0xfc00000000) at bitreverse.cu:8
8 __global__ void bitreverse(void *data) {
which obviously shows that something is not right. At the same time, if I try any of the CUDA-specific gdb info features I get nothing but this message:
(cuda-gdb) info cuda threads
Focus not set on any running CUDA kernel.
I’d desperately need debugging, so could somebody explain what’s going on?
I’m suffering from the same problem both on my code as well as simplified kernels. I’ve tried the walkthrough example from the cuda-gdb manual, but debugging kernels just does not work.
When trying to replicate the steps from the manual (page 20), at step 4, when breaking in the kernel I get this output:
(cuda-gdb) c
Continuing.
Breakpoint 2, bitreverse (__cuda_0=0xfc00000000) at bitreverse.cu:8
8 __global__ void bitreverse(void *data) {
which obviously shows that something is not right. At the same time, if I try any of the CUDA-specific gdb info features I get nothing but this message:
(cuda-gdb) info cuda threads
Focus not set on any running CUDA kernel.
I’d desperately need debugging, so could somebody explain what’s going on?
Make sure your app is compiled with -g -G options.
When the breakpoint is hit cuda-gdb focus (active stack) can be on the host thread in that case you need to switch to the device(GPU) by first viewing the list of all kernels and then explicitly switching to device as follows. Then you can do info cuda threads to see all the device threads.
Switch to CUDA kernel:
(cuda-gdb) info cuda kernels
0 Device 0 bitreverse <<<(1,1),(256,1,1)>>> (data=0x100000) at
bitreverse.cu:9
(cuda-gdb) cuda kernel 0
[Switching to CUDA Kernel 0 (<<<(0,0),(0,0,0)>>>)] #0 bitreverse <<<(1,1),(256,1,1)>>> (data=0x100000) at bitreverse.cu:9
9 unsigned int idata = (unsigned int)data;
(cuda-gdb) bt #0 bitreverse <<<(1,1),(256,1,1)>>> (data=0x100000) at bitreverse.cu:9
I’ve been having the same problem. I’ve tried to use it on linux, switching off X server, using a GTX-480. cuda-gdb seems to detect the CUDA device. I can set breakpoints at the beginning of the kernel, but then it continues right till the end of the kernel and continues to the host code. It is not possible to switch to kernel code at all using the gdb commands.
However, I think, stepping into the code, perhaps again deceptively, seemed to work once, at which point it delved into unfathomable depths as I’m using a lot of library calls.
What I recall is, I set a breakpoint in the middle of the kernel and absolutely nothing happened then.
Could this be a problem with the single GPU setup?
This is the latest 3.2 cuda-gdb that I’m using. Any recommendations after when the breakpoint at the kernel function is reached? (i.e. after “break test_kernel” and “run”)
Yes, I’ve used those flags. I can try it out again in a bit, I’m using the latest Ubuntu 10.10 distribution, and this is about the only major toolkit bug I’ve seen. Interestingly, a similar behavior was seen on OS X, while I was running Aqua, that’s why I thought maybe it is not detecting the GPU properly, but then the info command did show a single GPU. I could never see any threads with the info cmds though. Let me try this once more on Linux, just to make sure.
Ok I’ve reinstalled everything on another 64-bit Ubuntu 10.10 system, and no, it’s not possible to step into kernels, and although the CUDA program runs correctly, cuda-gdb cannot detect any kernels or cuda threads or set focus on anything. The only command that works is info cuda system, which correctly reports my GF100. I’ve compiled with -arch=sm_20 -g -G
I still can’t debug at all and now that I have to track down some fairly tricky bugs, printf-ing has proved to just consume lots of time and help very little. Has anyone figured out workaround/fix?
Right, that was my hope as well, forgot to mention that I just tried it today and it does not work. It seems to happily step trough the CPU-code generated by nvcc, but it does not step into the kernel. Really strange…
I am suffering from the same problem, too, have you filed a bug report about it?
What is your platform and exact hardware? There may be troublesome configurations that the developers may not have had the chance to test. I was last using the OS X cuda-gdb and linux cuda-gdb from CUDA 3.2 on GTX-480 with no success. Exactly the same problem, you can’t step into the GPU code, can’t trace, only CPU threads worked, although the debugger seems to fire up all right.