Nsys exits immediately when profiling OpenMP offload application

I have an application that uses OpenMP offload and is compiled with LLVM 15.
When I try to profile with nsys, the application exits immediately and gives the “Generating *.qdstrm” message. The return code is 139, indicating the app exited with a segfault.

This happens on multiple systems (and multiple applications)

On one system, the nsys version is 2022.4.18 (from cuda 12.0). On that system, profiling the application works if I use the LLVM 15 provided by system. If I use an LLVM mainline I compiled, nsys doesn’t work.

On another system, I’m using nsys version 2022.4.2.1 (from cuda 11.8). On that system, nsys fails regardless of the location of the LLVM (LLVM 15 from the system, LLVM 15 I compiled, and LLVM mainline I compiled).

If LIBOMPTARGET_DEBUG is enabled, the following is output when running under nsys:

Libomptarget --> Init target library!
Libomptarget --> Call to omp_get_num_devices returning 0

For a run outside of nsys, the starting debug output looks like:

Libomptarget --> Init target library!
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so': libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA

There is no line about omp_get_num_devices in the normal output. I suspect something in the nsys profile launch path is calling omp_get_num_devices before the OpenMP target library is fully initialized.

Hi Mark, fancy meeting you here!

@skottapalli or @rknight can you help Mark?

Hey Mark, good to see you. What is the command line you are using to profile the app? nsys by default traces cuda, nvtx, opengl. Could you try launching your app using the following command line?

nsys profile -t none -s none --cpuctxsw=none
This turns off all tracing and sampling. If the app launches successfully with this command line, then the environment under nsys is ok. You can add the tracing and sampling features back to the command line to isolate which feature is causing the segfault in your app.

You never know who will turn up in the forums :)

It’s opengl tracing that’s causing the problem. If I trace just cuda and nvtx it runs without the app crashing. Thanks!