Nsys does not show CUDA kernels

Hi,
I am running a code and it works fine and numerical results are correct, but by runing says I have several issues.

nsys profile --trace=cuda ./testing_dgetrf_gpu -n 10240 --matrix rand --version 3 --dev 0

  1. It end with error 127.

Collecting data...
symbol lookup error: /Soft/cuda/11.1/nsight-systems-2020.3.4/target-linux-x64/libToolsInjectionProxy64.so: undefined symbol: __libc_dlsym, version GLIBC_PRIVATE

The target application returned non-zero exit code 127
  1. By using nsys-ui I am not seeing not any kernel (I have selected to get trace of cuda kernels).

  1. Do we have CUDA 12? I think the last version was 11.8, or this is somethinng else?
CUDA driver version on the target (12.0) is not supported by this build of Nsight Systems.
CUDA trace will be collected using libraries for older driver version but some features might be missing or work incorrectly.
Check for updates to see if there is a newer version available
libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
libGL error: failed to load driver: swrast
OpenGL version is too low (0). Falling back to Mesa software rendering.
OpenGL version: "3.1 Mesa 18.1.9 (git-f57f37f3ba)"

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3D:00.0 Off |                  N/A |
| 30%   31C    P8     6W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |

I think you have multiple versions of Nsys here, can you check and see?

In particular, that lookup error is checking version 2020.3.4, which shipped in a very old version of the CUDA toolkit, but another error is talking about CTK 12.

Let me know what versions you have?

Thanks.

Version: 2020.3.4.32-52657a0 Linux.
Qt version: 5.14.1.
Google Protocol Buffers version: 3.10.0.
Boost version: 1.70.0

Also I have the samme problemm with ncu: No kernels were profiled.

/Soft/cuda/11.1/nsight-compute-2020.2.0/target/linux-desktop-glibc_2_11_3-x64/ncu --export /users/Aran/Documents --force-overwrite --target-processes all --replay-mode kernel --kernel-regex-base function --launch-skip-before-match 0 --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Occupancy --section SpeedOfLight --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --profile-from-start 0 --cache-control all --clock-control base --apply-rules yes --check-exit-code yes testing_dgetrf_gpu -n 1024 --matrix rand  --version 3 
==PROF== Connected to process 5097 (/users/scratch/snima/magma-2.5.4_cuda1140/testing/testing_dgetrf_gpu)
% MAGMA 2.5.4  compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 11010, driver 12000. OpenMP threads 2. MKL 2021.0.2, MKL threads 2. 
% device 0: NVIDIA GeForce RTX 2080 Ti, 1545.0 MHz clock, 11011.5 MiB memory, capability 7.5
% Mon Dec 12 19:40:58 2022
% Usage: /users/Aran/magma-2.5.4_cuda1140/testing/testing_dgetrf_gpu [options] [-h|--help]

% ngpu 1, version 3
%   M     N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   |PA-LU|/(N*|A|)
%========================================================================
 1024  1024     ---   (  ---  )      4.38 (   0.16)     ---   
==PROF== Disconnected from process 5097
==WARNING== No kernels were profiled.

That is a super old version of Nsys. Please update it from the webpage (developer.nvidia.com/nsight-systems) with version 2022.5.1

Thanks. So then it will work? What about the ncu?

It is likely to be the same problem. It looks like you are using a modern version of CUDA with old versions of the tools, you’ll want to update your NCU as well.

But beffor that It was working fine with the same CUDA. I will update both.