cuda-memcheck error on cuda sample program cdpLUDecomposition


I’m trying to debug a program for a research project that uses Cuda Dynamic Parallelism. My program was running without errors in debug (and no memcheck) and non-debug modes fine. However, when I set cuda memcheck on in cuda-gdb, the program failed with the following error:

Error: Failed to get warp state (dev=0, sm=0, wp=1), error=CUDBG_ERROR_UNKNOWN_FUNCTION(0x3)

After spending quite a while trying to figure this out, I decided to try integrated memcheck with cuda-gdb on some sample programs provided.
I observed exactly the same error on 6_Advanced/cdpLUDecomposition.

The error was not observed on 6_Advanced/cdpBezierTessellation, 6_Advanced/cdpAdvancedQuicksort or 6_Advanced/cdpQuadtree so its hard to imagine this has anything to do with dynamic parallelism in general.

I’m on a CentOS machine with a GeForce GTX Titan X device.


Hi, tnybny

Thanks for raising this.

Would you please paste out the operation within gdb that reproduce the error ?

Also please tell the toolkit version, driver version you used.
Thanks !

Hi Veraj,

This is the sequence I took:
From 6_Advanced/cdpLUDecomposition/

$ cuda-gdb cdpLUDecomposition
(cuda-gdb) set cuda memcheck on
(cuda-gdb) run

Entire output is:
Starting program: /home/bramach2/NVIDIA_CUDA-8.0_Samples/6_Advanced/cdpLUDecomposition/cdpLUDecomposition
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/usr/lib64/”.
warning: File “/opt/ohpc/pub/compiler/gcc/5.4.0/lib64/” auto-loading has been declined by your `auto-load safe-path’ set to “$debugdir:$datadir/auto-load”.
To enable execution of this file add
add-auto-load-safe-path /opt/ohpc/pub/compiler/gcc/5.4.0/lib64/
line to your configuration file “/home/bramach2/.cuda-gdbinit”.
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file “/home/bramach2/.cuda-gdbinit”.
For more information about this security protection see the
“Auto-loading safe path” section in the GDB manual. E.g., run from the shell:
info “(gdb)Auto-loading safe path”
Starting LU Decomposition (CUDA Dynamic Parallelism)
[New Thread 0x7ffff3485700 (LWP 11883)]
GPU Device 0: “GeForce GTX TITAN X” with compute capability 5.2

GPU device GeForce GTX TITAN X has compute capabilities (SM 5.2)
Compute LU decomposition of a random 1024x1024 matrix using CUDA Dynamic Parallelism
Launching single task from device…
[New Thread 0x7ffff1c82700 (LWP 11884)]
[New Thread 0x7ffff1481700 (LWP 11885)]
Error: Failed to get warp state (dev=0, sm=0, wp=2), error=CUDBG_ERROR_UNKNOWN_FUNCTION(0x3).

Toolkit version: Cuda compilation tools, release 8.0, V8.0.61
Driver version: Driver Version: 375.26


I can not reproduce the error info using GM200(same core as Geforce GTX TITAN X), maybe because I didn’t use the exact GPU.
But I find the sample will cost long to end in my case.

Anyway, I have raised the issue to the dev to check.
I will update here if I get any info

Thanks a lot.