CUDA API calls throw exceptions in remote debugger

Any calls to the CUDA API result in the software jumping to _dl_catch_exception() at 0x7fb7d51b5c during debugging. I am running the remote debugger as root on the TX2 development kit.

However, if I launch the software by pressing “run” in NSight it works fine. Furthermore, if I scp the executable onto the TX2 and launch it outside of the debugger via the terminal it works as expected.

This occurs in both the deviceQuery sample code and the code below, as well as various other simple examples using cudaMalloc and cudaFree I have tested.

I am using Jetpack 4.2 and I think I am having the same problem as described here: https://devtalk.nvidia.com/default/topic/1050017/nsight-eclipse-edition/nsight-opengl-debugging/

Any help is much appreciated.

#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>

int main(int argc, char** argv) {

    int numDevices = 0;
    cudaGetDeviceCount(&numDevices);

    printf("Device count %d", numDevices);

    return 0;
}

Below are the GDB traces from the deviceQuery sample starting from line 61 (the first CUDA API call) up to the exception occuring

012,296 131^done,threads=[{id="1",target-id="Thread 15637.15637",name="dave",frame={level="0",addr="\
0x000000555555e90c",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7ffffff508"}],fi\
le="../src/deviceQuery.cpp",fullname="/home/robert/cuda-workspace/dave/src/deviceQuery.cpp",line="61\
"},state="stopped",core="5"}]
012,296 (gdb) 
020,573 132-exec-next 1
020,574 132^running
020,574 *running,thread-id="all"
020,574 (gdb) 
020,612 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 from remote target...\n"
020,621 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1",target-name="/usr/lib/aar\
ch64-linux-gnu/tegra/libcuda.so.1",host-name="target:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1",\
symbols-loaded="0",thread-group="i1"
020,621 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so from remote target...\n"
020,629 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so",target-name="/usr/lib/a\
arch64-linux-gnu/tegra/libnvrm_gpu.so",host-name="target:/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gp\
u.so",symbols-loaded="0",thread-group="i1"
020,629 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so from remote target...\n"
020,636 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libnvrm.so",target-name="/usr/lib/aarch\
64-linux-gnu/tegra/libnvrm.so",host-name="target:/usr/lib/aarch64-linux-gnu/tegra/libnvrm.so",symbol\
s-loaded="0",thread-group="i1"
020,636 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so from remote target...\n"
020,644 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so",target-name="/usr/\
lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so",host-name="target:/usr/lib/aarch64-linux-gnu/tegra/\
libnvrm_graphics.so",symbols-loaded="0",thread-group="i1"
020,644 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0 from remote t\
arget...\n"
020,653 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0",ta\
rget-name="/usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0",host-name="target:/\
usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0",symbols-loaded="0",thread-group\
="i1"
020,653 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so from remote target...\n"
020,663 =library-loaded,id="/usr/lib/aarch64-linux-gnu/tegra/libnvos.so",target-name="/usr/lib/aarch\
64-linux-gnu/tegra/libnvos.so",host-name="target:/usr/lib/aarch64-linux-gnu/tegra/libnvos.so",symbol\
s-loaded="0",thread-group="i1"
020,681 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1.debug from remote target...\n"
020,681 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libcuda.so.1.1.debug from remote target...\
\n"
020,693 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so.debug from remote target...\n"
020,694 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_gpu.so.debug from remote target...\
\n"
020,704 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so.debug from remote target...\n"
020,704 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm.so.debug from remote target...\n"
020,712 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so.debug from remote target...\n\
"
020,712 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_graphics.so.debug from remote targ\
et...\n"
020,722 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.debug from remote ta\
rget...\n"
020,722 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvidia-fatbinaryloader.so.debug from re\
mote target...\n"
020,731 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so.debug from remote target...\n"
020,731 ~"Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvos.so.debug from remote target...\n"
020,942 *stopped,reason="end-stepping-range",frame={addr="0x0000007fb7d51b5c",func="_dl_catch_except\
ion",args=[],from="target:/lib/aarch64-linux-gnu/libc.so.6"},thread-id="1",stopped-threads="all",cor\
e="4"

Hi,

You will need root authority to run the profiler on TX2.
Have you login with root?
https://devtalk.nvidia.com/default/topic/1052253/jetson-agx-xavier/no-timeline-for-profiler-xavier-nvvp-and-nsight-compute-is-not-working-in-jetson-xavier/post/5347671/#5347671

Thanks.

I have configured the etc/ssh/ssshd_config file as is described in the thread you have linked, as well as the root password using sudo psswrd root, and I am able to login to the TX2 as root via SSH when launching the debugger.

The only difference between what you have described in that thread and what I did to enable root access via SSH was that I restarted the SSH service instead of rebooting the device.

I powered down the device before stopping work yesterday so I assume that this will be equivalent to the reboot described in that thread. Once I have powered up the TX2 today I will update this post with any news.

I did not have an opportunity to use test this yesterday, but the same error is still ocurring after having rebooted the device today.

The thread I have linked in my original post mentions that this problem does not occur if the debugger is run from the command line. Are there any instructions available that explain how to do this as I am not familiar with using a debugger in the command line?

Hi,

Do you have more error log about this issue? Or just _dl_catch_exception?

There is one more thing need to check. Do you use the same CUDA toolkit on TX2 and host.
Suppose your have installed host CUDA toolkit with the same JetPack installer, is it correct?

Thanks.

Both the host and TX2 have CUDA 10 installed via Jetpack 4.2 using the SDK manager. I have configured the cross compilation process as is described here https://devblogs.nvidia.com/cuda-jetson-nvidia-nsight-eclipse-edition/. As I mentioned in the original post if I copy the executable onto the TX2 and execute it via the terminal, or run it by clicking the run button in Nsight it works as expected. It seems unlikely that this is a toolchain error as it appears to be restricted to the debugger.

The cuda-gdbserver traces are listed in the original post. The full console output and screenshot of the debug tab from stepping from the beginning of the main function to the first cudaGetDeviceCount function call in the deviceQuery sample are below:

Coalescing of the CUDA commands output is off.
warning: "remote:" is deprecated, use "target:" instead.
warning: sysroot set to "target://".
Reading /lib/ld-linux-aarch64.so.1 from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /lib/ld-linux-aarch64.so.1 from remote target...
Reading /lib/ld-2.27.so from remote target...
Reading /lib/.debug/ld-2.27.so from remote target...
0x0000007fb7fd31c0 in ?? () from target:/lib/ld-linux-aarch64.so.1
$1 = 0xff
The target endianness is set automatically (currently little endian)
Reading /lib/aarch64-linux-gnu/librt.so.1 from remote target...
Reading /lib/aarch64-linux-gnu/libpthread.so.0 from remote target...
Reading /lib/aarch64-linux-gnu/libdl.so.2 from remote target...
Reading /usr/lib/aarch64-linux-gnu/libstdc++.so.6 from remote target...
Reading /lib/aarch64-linux-gnu/libgcc_s.so.1 from remote target...
Reading /lib/aarch64-linux-gnu/libc.so.6 from remote target...
Reading /lib/aarch64-linux-gnu/libm.so.6 from remote target...
Reading /lib/aarch64-linux-gnu/librt-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/.debug/librt-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/47f37309461cc15fb1915bc198d718017a1f87.debug from remote target...
Reading /lib/aarch64-linux-gnu/.debug/47f37309461cc15fb1915bc198d718017a1f87.debug from remote target...
Reading /lib/aarch64-linux-gnu/libdl-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/.debug/libdl-2.27.so from remote target...
Reading /usr/lib/aarch64-linux-gnu/a6cec032b9969b1d556b4ebee1400fabda2fdc.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/.debug/a6cec032b9969b1d556b4ebee1400fabda2fdc.debug from remote target...
Reading /lib/aarch64-linux-gnu/0a79672e8a81d551322b2912578e0ea9cef6e9.debug from remote target...
Reading /lib/aarch64-linux-gnu/.debug/0a79672e8a81d551322b2912578e0ea9cef6e9.debug from remote target...
Reading /lib/aarch64-linux-gnu/libc-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/.debug/libc-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/libm-2.27.so from remote target...
Reading /lib/aarch64-linux-gnu/.debug/libm-2.27.so from remote target...

Temporary breakpoint 1, main (argc=1, argv=0x7ffffff508) at ../src/deviceQuery.cpp:52
52	int main(int argc, char **argv) {
Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0 from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libcuda.so.1.1.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_gpu.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_graphics.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvidia-fatbinaryloader.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so.debug from remote target...
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvos.so.debug from remote target...

Hi,

Thanks for sharing more data with us.

We have passed this issue to our internal team.
Will update more information with you once we got feedback

I have the same problem. any solvotion?

Hi,

We are checking this issue but still needs some time.
Will update information here once we got any feedback from the internal team.

Thanks.

Hi +1

any CUDA related calls (as well as for OpenCV in my project) f.e.

cudaMalloc(&fCUDAValues, THREAD_COUNT*sizeof(double));

cause

_dl_catch_exception() at 0x7fb7d4fb5c in DEBUG session. Application (same debug build) execution on device works just fine.

SSH root access uses for remove debug.

Hi,

This is a known issue.

The exception is caused by CPU GDB rather than CUDA-GDB.
Since we are going to upgrade GDB from 7.12 to 8.2, this issue will be checked directly on the GDB 8.2.

Thanks.

Hi +1

I’ve done all required works including cross-compile setting, and before it I use SDK Manager to flash TX2 and install cuda and Nsight.
Also I did follow the instructions from: https://devtalk.nvidia.com/default/topic/1052253/jetson-agx-xavier/no-timeline-for-profiler-xavier-nvvp-and-nsight-compute-is-not-working-in-jetson-xavier/post/5347671/#5347671
to edit sshd_config file for root authority.
And it turns out hanging on any cuda code (CPU codes are OK) in debugging mode and showing nothing when I use profiler in Nsight Eclipse.

Is there any solution or update to follow?

Thanks.

Hi,

As mentioned in the comment #11:
This issue is caused by cpu-gdb rather than cuda-gdb.
It will be check in the gdb 8.2, which targets for next CUDA version.

Do you execute cuda-gdb with Nsight environment?
If yes, you can just press the “Resume” button to continue the debugging.

Thanks.

Hi +1,

I installed fresh software on Jetson Xavier Agx with SdkManager and the version of cuda-gdb is:

NVIDIA (R) CUDA Debugger
10.2 release
Portions Copyright (C) 2007-2020 NVIDIA Corporation
GNU gdb (GDB) 7.12

I’m trying to debug under ssh connection and have the same _dl_catch_exception().

I do not understand what do you mean by: “It will be check in the gdb 8.2, which targets for next CUDA version.”.