I have a CUDA kernel that works fine the first time it’s called, but will fail to launch if called enough. I’ve attached a testcase. I’m compiling with
/opt/cuda32/cuda/bin/nvcc -arch sm_20 -o test32 test.cu
I’m running on a GTX580,
LD_LIBRARY_PATH=/opt/cuda32/cuda/lib/ ./test32 -blockSize 512 -nThreads 8192
which will output something like:
512 x 16 = 8192
CUDA error: unspecified launch failure
It will often die before 10 and almost always dies before 100. The testcase will run fine if:
- I compile with v3.1.9 of the compiler. v3.2.9 and 3.2.16 fail. Compiling with 3.1.9 and using the 3.2.16 runtime libraries also runs fine.
- I run with 7168 threads instead of 8192.
- I compile with -arch sm_13 (and run blockSize=64 and nThreads=7680)
- I change almost anything in the code. Most of the code is useless but apparently is necessary to trigger the bug.
If I run under cuda-gdb, I get
Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching to CUDA Kernel 11 (<<<(9,0),(352,0,0)>>>)]
0x095e95c8 in doCalc<<<(16,1),(512,1,1)>>> ()
if I do “set cuda memcheck on”, I get
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching to CUDA Kernel 0 (<<<(10,0),(91,0,0)>>>)]
0x0a06a938 in doCalc ()
#0 0x0a06a938 in doCalc ()
#1 0x0a06a938 in doCalc<<<(16,1),(512,1,1)>>> ()
Compiling with -G makes the problem go away.
test.cu (3.57 KB)