Cannot single step CUDA (GPU) code using Nsight Eclipse (for Jetson AGX Xavier target)

Hello,

I need help debugging GPU code using Nsight Eclipse Edition (NEE).

Here are the details of my setup:

Host:

  • Ubuntu 16.04
  • JetPack 4.3; CUDA 10.0
  • NEE v10.0

Target:

  • Jetson AGX Xavier

I am cross-building for the ARM platform from my host PC.

I am able to do the following:

  • Cross-build applications (a simple hello world using CPU + GPU, as well as samples provided in the samples directory) for the target from the host, using NEE
  • Run the cross-built application on the target from the host, using NEE; works as expected
  • Single-step CPU code on the NEE, including setting breakpoints
  • Debug CPU and GPU code using the command line (cuda-gdb; not using EE)

I am NOT able to do the following:

  • Single-step GPU code using NEE.
    • I can put breakpoints in the CUDA code, but when I resume debugging from the CPU code, the screen freezes and control is lost. I cannot even terminate the debugging session cleanly.

I understand that “Debugging a CUDA GPU involves pausing that GPU. When the graphics desktop manager is running on the same GPU, then debugging that GPU freezes the GUI and makes the desktop unusable”

So I have disabled the GUI completely on the target and am logging into the Jetson remotely using the console.

I have tried 2 ways to put the Jetson in console only mode:

$ sudo systemctl set-default multi-user.target

And

By changing the run level.

It did not make any difference to the outcome.

I also made the following changes, with no difference to the outcome:

  • Set DISPLAY to “:0” in the Environment variable in NEE
  • Disabled timeouts in /sys/kernel/debug/gpu.0/timeouts_enabled
  • Enabled the “CUDA software preemption debugging” option in NEE

Please see below the console log from NEE and attached is a screenshot of the debugging perspective.

Let me know what I am missing.

Thanks,
Mithun

==========================================================================================

<< When debugger first starts up, here is the console output in NEE >>
#############################################################################
Coalescing of the CUDA commands output is off.
warning: “remote:” is deprecated, use “target:” instead.
warning: sysroot set to “target://”.
Reading /lib/ld-linux-aarch64.so.1 from remote target…
warning: File transfers from remote targets can be slow. Use “set sysroot” to access files locally instead.
Reading /lib/ld-linux-aarch64.so.1 from remote target…
Reading /lib/ld-2.27.so from remote target…
Reading /lib/.debug/ld-2.27.so from remote target…
0x0000007fb7fd31c0 in ?? () from target:/lib/ld-linux-aarch64.so.1
$1 = 0xff
The target endianness is set automatically (currently little endian)
Reading /lib/aarch64-linux-gnu/librt.so.1 from remote target…
Reading /lib/aarch64-linux-gnu/libpthread.so.0 from remote target…
Reading /lib/aarch64-linux-gnu/libdl.so.2 from remote target…
Reading /usr/lib/aarch64-linux-gnu/libstdc++.so.6 from remote target…
Reading /lib/aarch64-linux-gnu/libgcc_s.so.1 from remote target…
Reading /lib/aarch64-linux-gnu/libc.so.6 from remote target…
Reading /lib/aarch64-linux-gnu/libm.so.6 from remote target…
Reading /lib/aarch64-linux-gnu/librt-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/.debug/librt-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/47f37309461cc15fb1915bc198d718017a1f87.debug from remote target…
Reading /lib/aarch64-linux-gnu/.debug/47f37309461cc15fb1915bc198d718017a1f87.debug from remote target…
Reading /lib/aarch64-linux-gnu/libdl-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/.debug/libdl-2.27.so from remote target…
Reading /usr/lib/aarch64-linux-gnu/d7646e96801c7eed3642d3c10e301e0f3ea553.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/.debug/d7646e96801c7eed3642d3c10e301e0f3ea553.debug from remote target…
Reading /lib/aarch64-linux-gnu/4bfa7077953acb0e38a1039923ecb5fe9f6a62.debug from remote target…
Reading /lib/aarch64-linux-gnu/.debug/4bfa7077953acb0e38a1039923ecb5fe9f6a62.debug from remote target…
Reading /lib/aarch64-linux-gnu/libc-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/.debug/libc-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/libm-2.27.so from remote target…
Reading /lib/aarch64-linux-gnu/.debug/libm-2.27.so from remote target…

Temporary breakpoint 1, main () at …/src/hello1.cu:8
8 int main(void) {

Breakpoint 2, main () at …/src/hello1.cu:10
10 print_from_gpu<<<1,5>>>();

#############################################################################
<<< When I hit F8 to let it hit the breakpoint in the CUDA code, display freezes>>>

Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.3.1 from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libcuda.so.1.1.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_gpu.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvrm_graphics.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvidia-fatbinaryloader.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/libnvos.so.debug from remote target…
Reading /usr/lib/aarch64-linux-gnu/tegra/.debug/libnvos.so.debug from remote target…
#############################################################################

==========================================================================================

Hi.I have meet the same problem. I can use NEE to remote build and run,but when I try to debug on remote Target(TX2),single step execution will get stuck in CUDA API, such as CUDA malloc function, and cannot be executed downward.Did you solve the problem?