Nsys segfault during profile: pthread_create when connected over ssh with X11

This is on an Ubuntu 20.04 LTS with NVIDIA Nsight Systems version 2021.3.1.54-ee9c30a (either installed via the installer or the .deb package). The GPU is a 3090 RTX. The profile step segfaults:

$ nsys profile ./binary
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Segmentation fault (core dumped)

A backtrace reveals

Thread 11 “nsys” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee7fc700 (LWP 2472241)]
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00000000007ee9b1 in ?? ()
#2 0x00000000007eed86 in ?? ()
#3 0x00000000007ef39b in ?? ()
#4 0x00000000007e6a6a in ?? ()
#5 0x000000000072539b in ?? ()
#6 0x0000000000727563 in ?? ()
#7 0x00000000006ae7b6 in ?? ()
#8 0x00000000006aec7f in ?? ()
#9 0x00000000006aed55 in ?? ()
#10 0x00000000006aee0b in ?? ()
#11 0x00000000006af92a in ?? ()
#12 0x00000000006bbfa9 in ?? ()
#13 0x00000000006bf13b in ?? ()
#14 0x00000000006b18c5 in ?? ()
#15 0x00000000006c9809 in ?? ()
#16 0x00000000006ca045 in ?? ()
#17 0x00000000006ca094 in ?? ()
#18 0x00000000015602c0 in ?? ()
#19 0x00007ffff7f96609 in start_thread (arg=) at pthread_create.c:477
#20 0x00007ffff7d3b293 in clone () at …/sysdeps/unix/sysv/linux/x86_64/clone.S:95

Unexpectedly, the segfault goes away if X11 forwarding over ssh is turned off (I was connected over ssh). I stumbled upon this from another forum post here from about 2 years ago: Nsight system profile segmentation fault on startup

It appeared odd that a command-line utility would end up being sensitive to X11 forwarding.

@rknight could you take a look at this one?

Hi uday1,

What specific data are you trying to capture? For example, are you trying to trace CUDA, NVTX, the OS Runtime, Open GL? Are you interested in CPU IP/backtrace sampling? All of these data collectors are enabled by default when you use the nsys command line interface (CLI).

I suspect this might be an OpenGL issue. I suggest limiting the collection to one or two data collectors andtrying again. For example, try the following command line;

nsys profile --trace=cuda --sample=cpu ./binary

Then, add each data collector you need one at a time until the issue happens again. Please let us know what you find out.

Sorry about the delay in getting back here. I’m profiling a CUDA binary and am only interested in GPU performance statistics, for example, the kernel execution times. This is how I normally run:

$ nsys profile …
$ nsys stats -q --report gpukernsum report13.qdrep

My binary itself makes no calls to anything OpenGL but only to the CUDA runtime and to launch and run a CUDA kernel. When it runs fine, this is how it runs:

$ nsys profile …
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data…
Processing events…
Saving temporary “/tmp/nsys-report-6554-ca1c-479d-cb38.qdstrm” file to disk…

Creating final output files…
Processing [===============================================================100%]
Saved report file to “/tmp/nsys-report-6554-ca1c-479d-cb38.qdrep”
Report file moved to “/home/uday/project-dir/report2.qdrep”

$ nsys stats -q --report gpukernsum report2.qdrep

Time(%) Total Time (ns) Instances Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Name

100.0 50,559 1 50,559.0 50,559 50,559 0.0 main_kernel

I tried it with --trace=cuda --sample=cpu and this avoids the crash. Thanks!