cuda-gdb don't stop the program

95A31 · January 13, 2020, 7:47pm

Hello CUDA experts,

Recently I am incurring in a odd behavior from cuda debug tools.

If I run my code with cuda-gdb without memcheck it terminates successfully, but if I enable memcheck I get the following errors:

Error: Failed to get warp state (dev=0, sm=0, wp=2), error=CUDBG_ERROR_UNKNOWN_FUNCTION(0x3).

(cuda-gdb) backtrace

Selected thread is running.

So I don’t know from where start to debug. If I use cuda-memcheck the error is the following:

[Error] an illegal memory access was encountered

========= Program hit cudaErrorIllegalAddress (error 77) due to "an illegal memory access was encountered" on CUDA API call to cudaDeviceSynchronize.

And also in this case I don’t know from where start to debug.

Do you have any suggestion to deal with this situation?

Robert_Crovella · January 24, 2020, 5:43pm

It’s entirely plausible that an error does not become evident until it is run with cuda-memcheck, or the memory checker built into cuda-gdb. This is the reason why cuda-memcheck was created, and in some ways it is similar to tools like valgrind - which finds “hidden” or “latent” errors.

An example of such an error would be reading one element beyond the end of allocated memory. Ordinary host code or device code won’t throw an error in such a situation, even though it is illegal behavior. However, host code run under valgrind, or CUDA code run under cuda-memcheck, will identify such a situation. I’m not saying this is exactly your situation, just giving an example of the plausibility of your situation.

With respect to how to debug such an issue, I would start by running your code in an ordinary fashion (not in cuda-gdb) but with cuda-memcheck.

Follow the instructions here to localize the illegal memory access to a specific line of kernel source code:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

Such errors often come about due to erroneous indexing. Once you’ve identified the specific line of source code that is causing the error, you may immediately spot the issue or may be able to use printf statements in kernel code to identify what is happening.

Barring that, you can use that line of source code to focus your effort with cuda-gdb. Set a breakpoint immediately before that line of source code, and inspect variables, indices, etc. If need be, work backward, in typical debugging fashion (at this point the debugging concepts are no different than host code debugging: set breakpoints, inspect variables, single-step, etc.)

If you can ascertain the actual condition (e.g. index out of range) that is causing the illegal memory access from the cuda-memcheck experiment, you could use that information to further focus your effort in cuda-gdb by setting a conditional breakpoint, based on the index value, for example. This will cause the breakpoint to occur on the thread/warp that was actually about to make the illegal access. Be advised that using conditional breakpoints can have a large effect on the speed of debugging (speed of code execution in debug mode under cuda-gdb).

Topic		Replies	Views
Debugging illegal accesses CUDA Programming and Performance	4	727	June 15, 2019
Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ kernel problem or driver issue? CUDA Programming and Performance	6	10570	October 12, 2021
Making use of an error message CUDA Programming and Performance	5	612	December 19, 2018
Help catching an illegal memory access CUDA Programming and Performance decoder , cuda , debugger	15	359	November 7, 2024
cuda-memcheck.exe caused an illegal memory access error. CUDA-MEMCHECK	0	1818	October 27, 2016
illegal memory access - any help appreciated CUDA Programming and Performance	5	6679	February 8, 2018
Memcheck error accessing heap memory in non-default stream CUDA Programming and Performance	6	777	January 10, 2018
Illegal memory access crash CUDA Programming and Performance	15	4505	January 30, 2022
Cuda-gdb - dynamic parallelism support CUDA Programming and Performance	4	823	September 2, 2018
cuda-memcheck identifies libcuda.so as source of a cudaErrorIllegalAddress error CUDA Programming and Performance	2	647	December 19, 2017

cuda-gdb don't stop the program

Related topics