I have an application that uses OpenMP offload and is compiled with LLVM 15.
When I try to profile with nsys, the application exits immediately and gives the “Generating *.qdstrm” message. The return code is 139, indicating the app exited with a segfault.
This happens on multiple systems (and multiple applications)
On one system, the nsys version is 2022.4.18 (from cuda 12.0). On that system, profiling the application works if I use the LLVM 15 provided by system. If I use an LLVM mainline I compiled, nsys doesn’t work.
On another system, I’m using nsys version 2022.4.2.1 (from cuda 11.8). On that system, nsys fails regardless of the location of the LLVM (LLVM 15 from the system, LLVM 15 I compiled, and LLVM mainline I compiled).
If LIBOMPTARGET_DEBUG is enabled, the following is output when running under nsys:
Libomptarget --> Init target library!
Libomptarget --> Call to omp_get_num_devices returning 0
For a run outside of nsys, the starting debug output looks like:
Libomptarget --> Init target library!
Libomptarget --> Loading RTLs...
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so': libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
There is no line about omp_get_num_devices in the normal output. I suspect something in the nsys profile launch path is calling omp_get_num_devices before the OpenMP target library is fully initialized.