Nsight Systems does not collect CUDA events

cozmaden · November 22, 2021, 8:11am

Hi everyone,

I am puzzled as to why I cannot get Nsight Systems to work properly. It’s my first time using the profiler and posting here, so excuse me if the question turns out to be banal. I would be very glad if I could get some help.

I am trying to profile a Julia application I wrote using CUDA. I get the following error:

julia> CUDA.@profile #'some expression here using CUDA.jl' 
[ Info: Running under Nsight Systems, CUDA.@profile will automatically start the profiler

WARNING: CUDA tracing is required for cudaProfilerStart/Stop API support. Turning it on by default.
There are no active sessions.
ERROR: failed process: Process(/usr/local/bin/nsys stop, ProcessExited(1)) [1]

Stacktrace:...

caused by: Failed to compile PTX code (ptxas received signal 11)
If you think this is a bug, please file an issue and attach /tmp/jl_DLp64D.ptx
Stacktrace: ...

I’ve left out the stack traces as these are specific to Julia. Can post them if needed.

Upon launching using profile command:

~$ nsys profile julia
End of file

I can get the profile session to start using the UI, but no CUDA events are recorded: “No CUDA events collected. Does the process use CUDA?”

I have a GeForce GTX 1050 Ti GPU.

This is the output of uname -a

~$ uname -a
Linux copenhagen 5.13.0-7620-generic #20~1634827117~21.04~874b071-Ubuntu SMP Fri Oct 29 15:06:55 UTC  x86_64 x86_64 x86_64 GNU/Linux

Output of cat /proc/sys/kernel/perf_event_paranoid

~$ cat /proc/sys/kernel/perf_event_paranoid
1

This is the output of nvidia-smi

~$ nvidia-smi
Mon Nov 22 08:51:19 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   39C    P0    N/A /  75W |    965MiB /  4034MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Output of /usr/local/bin/nsys --version

~$ /usr/local/bin/nsys --version
NVIDIA Nsight Systems version 2021.5.1.77-4a17e7d

cozmaden · November 22, 2021, 12:19pm

By the way, Nsight Systems doesn’t work for CUDA C either. I compiled an example under ussr/lib/cuda/samples/0_Simple/vectorAdd and still get the same error:

~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ sudo make
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ ls
Makefile  NsightEclipse.xml  readme.txt  vectorAdd  vectorAdd.cu  vectorAdd.o
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ ./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ nsys profile vectorAdd
End of file

Just so to exclude this being an error coming from the Julia side of things.

hwilper · November 30, 2021, 4:21pm

@liuyis can you take a look at this?

liuyis · November 30, 2021, 5:06pm

Hi @cozmaden, could you try if the following command works:

~:/usr/lib/cuda/samples/0_Simple/vectorAdd$ nsys profile -t none -s none --cpuctxsw=none vectorAdd

?

cozmaden · December 1, 2021, 1:15pm

Unfortunately I had to resolve the problem quickly to keep on working on a project. I have reinstalled my operating system, since I was just testing Pop!_OS for a limited time.

Currently got back to an an arch-based distro (EndeavourOS) with the latest drivers and toolkit versions from pacman and I did not encounter this problem.

So I can only speculate now. Might have been a problem with the older drivers available via apt on Pop!_OS with the combination of older toolkit versions.

Thanks for getting back anyway @hwilper @liuyis

Alexandre_Chen · December 10, 2021, 5:53am

I had similar issues with both Julia and C. And this one works for me. Why this one works?

Alexandre_Chen · December 10, 2021, 5:57am

But in the generated “report1.nsys-rep” file, the timeline is empty when I open this file with nsight-sys ui

ztasoulas · December 10, 2021, 5:52pm

This is caused due to the -t none option.

From the manual:

If the none option is selected, no APIs are traced and no other API can be selected.

liuyis · December 12, 2021, 9:53pm

Hi @Alexandre_Chen, the command line I provided in Nsight Systems does not collect CUDA events - #4 by liuyis was just an attempt to narrow down the issue, but not a solution. It disables all trace & sampling options so the report will be empty.

Which Nsys version were you using? Do you hit the same “End of file” error even for the simple vectorAdd app?

Alexandre_Chen · December 15, 2021, 7:05pm

I updated my WSL version and installed the latest NVIDIA driver for WSL on windows 11 and it seems this problem has gone. I am using Driver version 510.06, CUDA version 11.6, and NVIDIA Nsight Systems version 2021.2.4.12-a25c8fd now.

Alexandre_Chen · December 16, 2021, 10:51am

I was having the same issue.

➜  Poisson_Julia git:(master) nvidia-smi
Thu Dec 16 02:49:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 23%   42C    P3    26W / 120W |   2671MiB /  5910MiB |     29%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2432      G   /usr/lib/xorg/Xorg               1719MiB |
|    0   N/A  N/A      2579      G   /usr/bin/gnome-shell              199MiB |
|    0   N/A  N/A      2683      G   ...mviewer/tv_bin/TeamViewer        1MiB |
|    0   N/A  N/A      3267      G   ...AAAAAAAAA= --shared-files      109MiB |
|    0   N/A  N/A      4372      G   ...AAAAAAAAA= --shared-files      309MiB |
|    0   N/A  N/A      6392      G   ...AAAAAAAAA= --shared-files       92MiB |
|    0   N/A  N/A    172635      G   ...AAAAAAAAA= --shared-files      167MiB |
|    0   N/A  N/A    207779      G   ...ost-linux-x64/nsys-ui.bin       65MiB |
+-----------------------------------------------------------------------------+

I can’t use nsys to profile either julia or normal binary executable compiled from nvcc by nsys launch without specifying --trace=cuda. I saw “End of File” information as well. A lot examples in online video tutorial just have nsys launch without specifying --trace=cuda because by default this is the case. Is this a bug of nsys?

Alexandre_Chen · December 16, 2021, 10:55am

This is an example

➜  CUDA_code nsys profile ./kernel_abc
End of file
➜  CUDA_code nsys profile --trace=cuda,cublas ./kernel_abc
Generating '/tmp/nsys-report-1d37.qdstrm'
[1/1] [========================100%] report6.nsys-rep
Generated:
    /home/alexandre/Code/CUDA_code/report6.nsys-rep
➜  CUDA_code nsys stats report6.nsys-rep 
Generating SQLite file report6.sqlite from report6.nsys-rep
Exporting 9521 events: [===================================================100%]
Using report6.sqlite for SQL queries.
Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/nvtxsum.py report6.sqlite]... SKIPPED: report6.sqlite does not contain NV Tools Extension (NVTX) data.

Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/osrtsum.py report6.sqlite]... SKIPPED: report6.sqlite does not contain OS Runtime trace data.

Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/cudaapisum.py report6.sqlite]... 

 Time (%)  Total Time (ns)  Num Calls     Avg (ns)         Med (ns)        Min (ns)       Max (ns)     StdDev (ns)           Name         
 --------  ---------------  ---------  ---------------  ---------------  -------------  -------------  ------------  ---------------------
     94.1    2,138,206,904          1  2,138,206,904.0  2,138,206,904.0  2,138,206,904  2,138,206,904           0.0  cudaDeviceSynchronize
      5.9      135,082,224          2     67,541,112.0     67,541,112.0        159,657    134,922,567  95,291,767.5  cudaMalloc           
      0.0           21,788          2         10,894.0         10,894.0          5,067         16,721       8,240.6  cudaMemset           
      0.0           14,896          3          4,965.3          3,223.0            666         11,007       5,386.2  cudaLaunchKernel     

Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/gpukernsum.py report6.sqlite]... 

 Time (%)  Total Time (ns)  Instances     Avg (ns)         Med (ns)        Min (ns)       Max (ns)     StdDev (ns)                   Name                  
 --------  ---------------  ---------  ---------------  ---------------  -------------  -------------  -----------  ---------------------------------------
     99.9    2,136,014,745          1  2,136,014,745.0  2,136,014,745.0  2,136,014,745  2,136,014,745          0.0  kernel_A(double *, int, int)           
      0.1        1,086,041          1      1,086,041.0      1,086,041.0      1,086,041      1,086,041          0.0  kernel_C(double *, const double *, int)

Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/gpumemtimesum.py report6.sqlite]... 

 Time (%)  Total Time (ns)  Count  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)    Operation  
 --------  ---------------  -----  ---------  ---------  --------  --------  -----------  -------------
    100.0        1,121,392      2  560,696.0  560,696.0   543,084   578,308     24,907.1  [CUDA memset]

Running [/opt/nvidia/nsight-systems/2021.5.1/target-linux-x64/reports/gpumemsizesum.py report6.sqlite]...

The binary executable was compiled from this code using nvcc.

liuyis · December 16, 2021, 4:09pm

@Alexandre_Chen Thanks for the information. nsys profile without any switch will turn on CUDA, NVTX, OSRT and OpenGL traces. There may be some issue with OSRT (most likely), NVTX or OpenGL trace that caused the End of file error, so you won’t hit it by explicitly specifying --trace=cuda,cublas.

Are you still able to reproduce it? If so could you try nsys profile --trace=osrt -s none --cpuctxsw=none ./kernel_abc to confirm if it’s an OSRT issue?

Thanks

Alexandre_Chen · December 16, 2021, 9:45pm

➜  CUDA_code nsys profile --trace=osrt -s none --cpuctxsw=none ./kernel_abc
Generating '/tmp/nsys-report-9994.qdstrm'
[1/1] [========================100%] report7.nsys-rep
Generated:
    /home/alexandre/Code/CUDA_code/report7.nsys-rep

liuyis · December 16, 2021, 11:45pm

Does the issue still happen with just nsys profile ./kernel_abc?

Could you also try the following:

nsys profile -t none ./kernel_abc
nsys profile -t nvtx -s none --cpuctxsw=none ./kernel_abc
nsys profile -t opengl -s none --cpuctxsw=none ./kernel_abc

Thanks

courteauxmartijn · March 10, 2022, 11:24am

I just tried this, and --trace=opengl is the one causing a problem (nothing happens, and “End of file” is being printed). I have:

NVIDIA Nsight Systems version 2022.1.1.61-1d07dc0
NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

I’ve attached a the strace log of this process and forked ones. You can find the “End of file” write here to. I hope this trace will be useful when trying to figure out what goes wrong. I would debug myself further, but without access to the code, it’s hard. You can see that the End of file print is done by a forked process.
stracelog.txt (1.6 MB)

You can see in the beginning the dynamic loader loading stuff from my CUDA 11.4 install folder. Hopefully those are not a problem. It should be noted that for this use case, I’m only interested in tracing OpenGL.

liuyis · March 10, 2022, 5:07pm

@courteauxmartijn Thanks for the update. We are able to reproduce this at our side. An internal ticket has been opened to track and fix this, but we don’t have a specific estimate yet.

For now there are some workarounds:

If you don’t need to trace OpenGL, remove opengl from --trace (or -t) (note that the default value for --trace is cuda,nvtx,osrt,opengl when you do not explicitly specify it).
If you do need to trace OpenGL, try:
- Set VK_ICD_FILENAMES to empty value or only specific ICD files, e.g export VK_ICD_FILENAMES=
- rename /usr/share/vulkan/icd.d/lvp_icd.x86_64.json

courteauxmartijn · March 11, 2022, 8:47am

@liuyis Dang! That VK_ICD_FILENAMES variable did it. Am I missing out on some features now, or does it work fully correctly? I would think it works fully correctly, as the VK_ prefix suggests Vulkan, and I’m profiling OpenGL. Thanks for coming back to me so quickly!

liuyis · March 11, 2022, 5:16pm

Hi @courteauxmartijn, glad to hear it works for you. Yes it works correctly with this workaround.

ecoumans · June 9, 2022, 4:05pm

This helped fixing an issue on Ubuntu 20.04 with NSIGHT Systems and the -t vulkan flag as well.
This command-line would report “End of file”:
And nsys-ui would not even start the application or Vulkan tracing.

After renaming, both nsys -t vulkan and nsys-ui started working.

I’m curious, what is lvp_icd.x86_64.json for exactly? There are other files for Intel, AMD and NVIDIA.
https://packages.debian.org/sid/amd64/mesa-vulkan-drivers/filelist

Topic		Replies	Views
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1652	January 12, 2023
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2463	September 14, 2023
Unable to capture "Can't find UUID for CUDA device" Profiling Linux Targets	10	2287	November 9, 2023
NSight Compute not finding kernels Nsight Compute	24	501	October 24, 2024
Nsys cannot collect cuda information on Drive OS 5.1 DRIVE AGX Xavier General drive-devtools	62	3867	October 12, 2021
Help decipher logs(No GPU associated to the given GPU ID) Profiling Linux Targets	38	4623	November 28, 2022
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	28	7012	November 14, 2024
Importing CUDA.jl under Nsight Systems exits the process Profiling x86 Windows Targets	8	30	November 13, 2024
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	875	November 5, 2024
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3406	July 28, 2023

Nsight Systems does not collect CUDA events

Related topics