Nsight Systems does not collect CUDA events

Hi everyone,

I am puzzled as to why I cannot get Nsight Systems to work properly. It’s my first time using the profiler and posting here, so excuse me if the question turns out to be banal. I would be very glad if I could get some help.

I am trying to profile a Julia application I wrote using CUDA. I get the following error:

julia> CUDA.@profile #'some expression here using CUDA.jl' 
[ Info: Running under Nsight Systems, CUDA.@profile will automatically start the profiler

WARNING: CUDA tracing is required for cudaProfilerStart/Stop API support. Turning it on by default.
There are no active sessions.
ERROR: failed process: Process(/usr/local/bin/nsys stop, ProcessExited(1)) [1]


caused by: Failed to compile PTX code (ptxas received signal 11)
If you think this is a bug, please file an issue and attach /tmp/jl_DLp64D.ptx
Stacktrace: ...

I’ve left out the stack traces as these are specific to Julia. Can post them if needed.

Upon launching using profile command:

~$ nsys profile julia
End of file

I can get the profile session to start using the UI, but no CUDA events are recorded: “No CUDA events collected. Does the process use CUDA?”


I have a GeForce GTX 1050 Ti GPU.

This is the output of uname -a

~$ uname -a
Linux copenhagen 5.13.0-7620-generic #20~1634827117~21.04~874b071-Ubuntu SMP Fri Oct 29 15:06:55 UTC  x86_64 x86_64 x86_64 GNU/Linux

Output of cat /proc/sys/kernel/perf_event_paranoid

~$ cat /proc/sys/kernel/perf_event_paranoid

This is the output of nvidia-smi

~$ nvidia-smi
Mon Nov 22 08:51:19 2021       
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   39C    P0    N/A /  75W |    965MiB /  4034MiB |      8%      Default |
|                               |                      |                  N/A |

Output of /usr/local/bin/nsys --version

~$ /usr/local/bin/nsys --version
NVIDIA Nsight Systems version 2021.5.1.77-4a17e7d

By the way, Nsight Systems doesn’t work for CUDA C either. I compiled an example under ussr/lib/cuda/samples/0_Simple/vectorAdd and still get the same error:

~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ sudo make
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ ls
Makefile  NsightEclipse.xml  readme.txt  vectorAdd  vectorAdd.cu  vectorAdd.o
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ ./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ nsys profile vectorAdd
End of file

Just so to exclude this being an error coming from the Julia side of things.

@liuyis can you take a look at this?

Hi @cozmaden, could you try if the following command works:

~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ nsys profile -t none -s none --cpuctxsw=none vectorAdd


Unfortunately I had to resolve the problem quickly to keep on working on a project. I have reinstalled my operating system, since I was just testing Pop!_OS for a limited time.

Currently got back to an an arch-based distro (EndeavourOS) with the latest drivers and toolkit versions from pacman and I did not encounter this problem.

So I can only speculate now. Might have been a problem with the older drivers available via apt on Pop!_OS with the combination of older toolkit versions.

Thanks for getting back anyway @hwilper @liuyis