Alright, it is indeed textures per compilation unit that is the cause. I’ve got a minimal repro case:
For the record, I’m running this on
Ubuntu 9.04, 32-bit (I know, 32-bit: I didn’t install the OS on this thing)
CUDA 3.0
Driver: 195.36.15
I built a simple 2 cu and 1 cc file test prog. add.cu has two kernels in it: one uses texture and one does not. sub.cu only has a non-texure kernel. My main aim is to test the debugability of an app built with CMake, but I get the exact same behavior if it is compiled with just: $ nvcc -g -G add.cu sub.cu test.cc -o test
Here are abbreviated cuda-gdb sessions showing the behavior I get:
The first one places a breakpoint in the non-texture using kernel in add.cu.
$ cuda-gdb ./test
NVIDIA (R) CUDA Debugger
3.0 release
........
(cuda-gdb) break add.cu:11
Breakpoint 1 at 0x8053eb1: file add.cu, line 11.
(cuda-gdb) run
Starting program: /home/joaander/test/source/test
[Thread debugging using libthread_db enabled]
[New process 8497]
[New Thread -1210673456 (LWP 8497)]
Breakpoint 1 at 0x857fd18: file add.cu, line 11.
Breakpoint 1 at 0x8053eb1: file add.cu, line 11.
[Switching to Thread -1210673456 (LWP 8497)]
Breakpoint 1, add_kernel (__cuda_0=0x110000, __cuda_1=256) at add.cu:14
14 }
(cuda-gdb) quit
The program is running. Exit anyway? (y or n) y
Note how it incorrectly broke out at line 14 on the CPU, after the kernel completed. Not shown here, but at this point, I can run the continue command and continue execution correctly.
Next up: a breakpoint in sub.cu (the compilation unit without any textures) works perfectly:
$ cuda-gdb ./test
NVIDIA (R) CUDA Debugger
3.0 release
......
(cuda-gdb) break sub.cu:11
Breakpoint 1 at 0x8064256: file sub.cu, line 11.
(cuda-gdb) run
Starting program: /home/joaander/test/source/test
[Thread debugging using libthread_db enabled]
[New process 8504]
[New Thread -1211652400 (LWP 8504)]
Breakpoint 1 at 0x858dd18: file sub.cu, line 11.
Breakpoint 1 at 0x8064256: file sub.cu, line 11.
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread -1211652400 (LWP 8504)]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
add_kernel <<<(1,1),(256,1,1)>>> (data=0x110000, n=256) at add.cu:11
11 unsigned int cur = data[idx];
(cuda-gdb) step
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
add_kernel <<<(1,1),(256,1,1)>>> (data=0x110000, n=256) at add.cu:12
12 data[idx] = cur+idx;
(cuda-gdb) print cur
$1 = 0
(cuda-gdb) cuda thread(10,0,0
New CUDA focus: device 0, sm 0, warp 0, lane 10, grid 60000, block (0,0), thread (10,0,0).
(cuda-gdb) print cur
$2 = 10
(cuda-gdb) quit
The program is running. Exit anyway? (y or n) y
Lastly, placing a breakpoint inside the kernel that uses a texture produces interesting results.
$ cuda-gdb ./test
NVIDIA (R) CUDA Debugger
3.0 release
....
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(cuda-gdb) break add.cu:27
Breakpoint 1 at 0x8053e26: file add.cu, line 27.
(cuda-gdb) run
Starting program: /home/joaander/test/source/test
[Thread debugging using libthread_db enabled]
[New process 16158]
[New Thread -1211156784 (LWP 16158)]
Breakpoint 1 at 0xa8: file add.cu, line 27.
Breakpoint 1 at 0x8053e26: file add.cu, line 27.
[Switching to Thread -1211156784 (LWP 16158)]
Breakpoint 1, add_tex_kernel (__cuda_0=0x110000, __cuda_1=256) at add.cu:30
30 }
(cuda-gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xb75cbcd0 in ?? () from /usr/lib/libcuda.so
(cuda-gdb) quit
The program is running. Exit anyway? (y or n) y
^C^C
It again broke on the CPU at the end of the kernel where the breakpoint was placed. Note that the system froze up for several minutes with ./test using 100% CPU at the “Continuing.” line, forcing me to control-C.
Attaching a tarball with the source code.