Exception while debugging. How to upgrade cuda-gdb?

Greetings,

I started to work on a new project that is based on NVidia Jetson Xavier Agx, I followed the instruction on how to set everything up. In the end, I have a running device with software installed from sdkmanager.

I am able to compile cuda programs and run them. But the problem is with debugging. Provided with cuda-10.2 debugger crashes when stepped over ((cuda-gdb) next) a cuda function with the message _dl_catch_exception(). I found a couple of relevant topics, for example this. Though I have the same problem even when debugging locally on the device itself. I tried to follow the instructions, tried to run the debugger with sudo but the problem stays. The problem is reproducible with cuda samples.

The linked topic is from 2019 and the NVidia employer mentions that this is a known bug with the gdb-7.12 and that should be fixed in gdb-8.2. But today is 2021 and the sdkmanager still installs the same broken version.

What is the fix? Should I upgrade the debugger manually? How can I do it?

Hi,

Could you share the detailed steps to reproduce this with a CUDA sample?
And the complete output error log with us first?

Thanks.

Hi @AastaLLL,

  1. Compile cuda-10.2/samples/0_Simple/vectorAdd with make dbg=1.
  2. Strart cuda debugger cuda-gdb vectorAdd.
  3. Break on the line with the cuda function break vectorAdd.cu:82 (this adds a breakpoint on the line err = cudaMalloc((void **)&d_A, size)).
  4. Run the program with run
  5. Step over the function with next.

After stepping over I have the following output:

Breakpoint 1, main () at vectorAdd.cu:82
82	    err = cudaMalloc((void **)&d_A, size);
(cuda-gdb) next
0x0000007fb7d4e684 in _dl_catch_exception ()
   from /lib/aarch64-linux-gnu/libc.so.6
(cuda-gdb) next
Single stepping until exit from function _dl_catch_exception,
which has no line number information.
0x0000007fb7fe2418 in _dl_find_dso_for_object ()
   from /lib/ld-linux-aarch64.so.1
(cuda-gdb) next
Single stepping until exit from function _dl_find_dso_for_object,
which has no line number information.
cuda-gdb/7.12/gdb/infrun.c:2795: internal-error: resume: Assertion `pc_in_thread_step_range (pc, tp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

If I run the program without the debugger, it works fine. If I continue instead of next, it works fine.
I tried to do the same steps on my laptop where I installed the same 10.2 Cuda and everything works fine.