I have been trying to debug some illegal access errors in my application lately, and it seems I am spending a lot of energy in something that should be simpler.
Is there any easy way to get a backtrace when the error happens? I have been trying cuda-gdb but it simply does not break when the error occurs. Regular gdb tricks don’t seem to work, am I missing something?
Is it possible to get a backtrace during device sync instead?
cuda-gdb usually breaks on an error for me, but I don’t generally try to debug on a display GPU. Make sure you are compiling the code with debug switches -g -G
Yes, I have been using cuda-gdb. I have tried with memcheck on also, but it still does not break on Illegal accesses most of the time. It is not a display GPU, it is a V100 used only for processing. I will try the coredumps, see if it helps.
What happens is that the program ends normally, with an error being returned by the sync function, but cuda-gdb does not break on it. Is that a known bug?
cuda-mecheck (run by itself on the executable) can isolate the exact line of code where the error is occurring. Have you been able to accomplish that? That usually helps most people quite a bit.
what is the actual error being returned by the sync function?