Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0)

uday1 · June 7, 2023, 4:41am

I’m using NVIDIA Nsight Systems version 2023.1.2.43-32377213v0 to profile a GPU run on an Geforce RTX 3080 Ti laptop GPU in the following way.

$ nsys profile -s none --cpuctxsw none --trace=cuda -o gpu_  <app>
...
Generating '/tmp/nsys-report-9419.qdstrm'
[1/1] [========================100%] gpu_.nsys-rep

$ nsys stats --report gpukernsum gpu_.nsys-rep 
Processing [gpu_.sqlite] with [/opt/nvidia/nsight-systems/2023.1.2/host-linux-x64/reports/gpukernsum.py]... 
SKIPPED: gpu_.sqlite does not contain CUDA kernel data.

However, as shown above, nsys doesn’t collect any GPU kernel data in the profile. I can confirm that the app is definitely running on the GPU as evidenced by nvidia-smi’s output (I’m watching it with watch -n 0.2 nvidia-smi and I can see the app running, the GPU memory and compute utilization increasing/changing constantly and its temperature/power also reflecting the run.)

Other relevant information:

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2289 G /usr/lib/xorg/Xorg 133MiB |
±--------------------------------------------------------------------------------------+

ztasoulas · June 14, 2023, 3:31pm

Hi @uday1,

Could you open the report with the Nsight Systems GUI and check if there are Diagnostic messages related to CUDA? You can check this in the Diagnostics Summary view.

Is it possible to share the report?

Could you provide some more information about the application? Is it a python application?
Could you run one of the CUDA samples and see if CUDA traces are collected for that?

uday1 · June 16, 2023, 6:35am

I just ran the standard CUDA bandwidth test from /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest. Nothing was collected.

CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA GeForce RTX 3080 Ti Laptop GPU
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11491.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			12822.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			445886.2

Result = PASS

gpu_.sqlite (680 KB)
gpu_.nsys-rep (256.1 KB)

nsys report data is attached.

ztasoulas · June 20, 2023, 3:11pm

Thanks for providing the additional information.

The reason for the gpukernsum output is that the specific CUDA sample, bandwidthTest, does not launch kernels. It mainly allocates memory on the GPU and transfers data between the host and the device, or between devices.

If you use other available scripts, you will be able to see CUDA activity.
E.g., nsys stats --report cuda_api_sum gpu_.nsys-rep

You can study the code for the specific CUDA sample following this Github link.

uday1 · June 20, 2023, 3:38pm

But cuda_api_sum doesn’t generate any output either for me!

$ nsys stats -q --report cuda_api_sum --format table gpu_.nsys-rep
$

Does the above yield any output for you on the file I attached?

ztasoulas · June 20, 2023, 3:54pm

You are right, let me try to reproduce this.

ztasoulas · June 22, 2023, 4:52pm

I couldn’t reproduce this on my end, CUDA is being traced.
Using the same CUDA sample and

NVIDIA GeForce RTX 3080 Ti
Driver 530.30.02
CUDA 11.8
Nsight Systems 2023.1.2.43-32377213v0

Are you able to see CUDA traces when profiling other samples?
Is there anything special on your setup? Are you using a VM?

Does updating to the latest Nsight Systems 2023.2.1 collect CUDA traces?

To continue debugging, could you please collect and share logs for this profiling? To collect logs please follow these steps:

Save the following content to /tmp/nvlog.config:

+ 100iwef global

$ /tmp/nsight-sys.log

ForceFlush

Format $sevc$time|${name:0}|${tid:5}|${file:0}:${line:0}[${sfunc:0}]:$text

Add --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config to your Nsys command line. E.g. nsys profile --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config -s none --cpuctxsw=none --trace=cuda -o gpu_ /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest
Run a collection. There should be logs at /tmp/nsight-sys.log. Share this file.

Do you have a use case where you need to use a BETA driver? Maybe using a recommended version would be more stable.

uday1 · June 23, 2023, 11:57am

I’m using CUDA 12.1 (as opposed to 11.8 in your setup). Also, it’s a 3080 Ti Laptop GPU that I have here.

NVIDIA GeForce RTX 3080 Ti Laptop GPU
Driver 530.30.02
CUDA 12.1
Nsight Systems 2023.1.2.43-32377213v0
Ubuntu 22.04 LTS

There is nothing special in my setup. I’m running stock Ubuntu 22.04. No VMs.

No CUDA traces are seen on profiling anything IIUC; the output of nsys stats on the generated report is always empty.

2023.2.1 is not yet available via apt from NVIDIA’s repos; so I can’t readily install it. I’d like to get it working on this setup as much as possible.

Followed the remaining steps you suggested to collect logs; attached.

Reg. the BETA driver: I didn’t understand the question. Am I using a non-stable version of some driver? All of the packages are installed from official NVIDIA Ubuntu repos.
nsight-sys.log (41.2 KB)

ztasoulas · June 23, 2023, 11:13pm

Thanks for providing the log file! I can’t see anything that leads to the cause of the issue.

The next step would be to try Nsight Compute, ncu, and see if that tool is able to trace CUDA. If you have installed the CUDA toolkit you should have ncu already on your device. Otherwise you can install it with sudo apt install cuda-nsight-compute-12-1.

You can try to profile vectorAdd with ncu, e.g., ncu -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd. You could open the collected report with the Nsight Compute GUI to verify that CUDA traces were collected. Or share the report here.

All of the packages are installed from official NVIDIA Ubuntu repos.

I agree that 530.30.02 is the version distributed by the repo, you could get the latest recommended driver version from the following link, Official Drivers | NVIDIA

The currently installed driver though should not be an issue. Nsight Systems is expected to work on your setup.

The latest version of Nsight Systems can be found at this link. You may need to create a free account to access the content.

You can download either Nsight Systems 2023.2.1 (Linux Host .run Installer), an installation script. It will not add dependencies. You can easily remove it by deleting the whole directory of the installation later.
Or Nsight Systems 2023.2.1 (Linux Host .deb Installer), which can be installed as a deb package.

uday1 · July 4, 2023, 10:03am

ncu isn’t profiling anything either.

$ ncu --target-processes all /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

==PROF== Connected to process 77986 (/usr/local/cuda-11.8/extras/demo_suite/bandwidthTest)
 Device 0: NVIDIA GeForce RTX 3080 Ti Laptop GPU
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11526.5

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			12849.3

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			420266.4

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 77986
==WARNING== No kernels were profiled.```

ztasoulas · July 5, 2023, 4:46pm

Hi @uday1 ,

This is expected behavior, since bandwidthTest does not launch any kernels.

Please run ncu -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd and provide the report file vectorAdd_profile.ncu-rep here. Or the CLI output, if a report is not created.

By the way, Nsight Systems 2023.2.3, nsight-systems-2023.2.3, is available through apt, if you want to give the latest nsys version a try.

uday1 · July 6, 2023, 11:40am

Thanks. I just realized that a bit later. Here’s the result of:
ncu -f -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd

[Vector addition of 50000 elements]
==PROF== Connected to process 181163 (/usr/local/cuda-11.8/extras/demo_suite/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==PROF== Profiling "vectorAdd" - 0: 0%....50%....100% - 9 passes
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==PROF== Disconnected from process 181163
==PROF== Report: /home/uday/vectorAdd_profile.ncu-rep

Note: I had to run with elevated privileges to avoid immediately rebooting for the module setting on perf counter permissions to take effect.

vectorAdd_profile.ncu-rep (43.0 KB)

ztasoulas · July 6, 2023, 6:00pm

Thanks for providing the Nsight Compute report file. This narrows down the potential sources of this issue.

The next step would be to use a simple injection library while running the CUDA sample to see if CUPTI is the source of the issue.

To do that please:

Download cuda-injection-library.tar.gz (970 KB) on your laptop.
Extract the files from the archive, tar -xvf cuda-injection-library.tar.gz
cd cuda-injection-library
make
LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64 CUDA_INJECTION64_PATH=./libToolsInjectionCuda.so /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest >injection_log.txt 2>&1
Share the injection_log.txt file, and the CLI output if there are errors.

uday1 · July 10, 2023, 8:09am

Here is the resulting injection_log.txt – pasted inline below as well.
injection_log.txt (1.1 KB)

13:38:13.130.174|360360|Lib.cpp:566[InitializeInjection]: Initializing CUDA tracing
13:38:13.136.022|360360|Lib.cpp:348[EnableCollection]: Starting collection
13:38:13.136.181|360360|Lib.cpp:580[InitializeInjection]: CUDA tracing initialized
13:38:13.354.183|360360|Lib.cpp:514[AtExitHandler]: Flushing CUPTI buffers on exit
13:38:13.354.439|360360|Lib.cpp:515[AtExitHandler]: FATAL: cuptiActivityFlushAll(CUPTI_ACTIVITY_FLAG_FLUSH_FORCED) failed: CUPTI_ERROR_INVALID_DEVICE
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA GeForce RTX 3080 Ti Laptop GPU
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12386.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12844.9

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     257483.8

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

ztasoulas · July 13, 2023, 5:01pm

Thanks a lot for providing the injection library logs! This shows that there is an issue with CUPTI.

Unfortunately I cannot reproduce this issue on my end, so we would need to collect more detailed error messages from CUPTI.

Could you please use this updated injection library and collect logs again?

Download cuda-injection-library.tar.gz (950 KB) on your laptop.
Extract the files from the archive, tar -xvf cuda-injection-library.tar.gz
cd cuda-injection-library
make
LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64 CUDA_INJECTION64_PATH=./libToolsInjectionCuda.so /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest >injection_log.txt 2>&1
Share the injection_log.txt file, and the CLI output if there are errors.

Please make sure to use the updated tar archive, the one attached to this message.

skottapalli · July 29, 2023, 2:33am

Hi @uday1, we followed up with the CUPTI team internally and they have confirmed that it is a bug in CUPTI. From CUPTI 11.8 onwards, the support for “Geforce RTX 3080 Ti laptop GPU” was broken. They are working on a fix which will be made available in a future release of nsys. In the meantime, you could go back to the CUDA 11.7 driver (version 515) and use the latest nsys version, if needed.

Sorry for the inconvenience so far. I understand it has been frustrating. Your help with debugging is greatly appreciated.

skottapalli · August 18, 2023, 9:43pm

We have a new version of nsys that contains the fix from CUPTI Nsight Systems | NVIDIA Developer
Please try it out and let us know if you are still running into problems.

uday1 · August 19, 2023, 6:16pm

Thanks, nsys 2023.3 resolves this (used the .deb installer). But this isn’t yet available in NVIDIA’s cuda repo for Ubuntu 22.04 for easy install.

skottapalli · August 31, 2023, 3:06pm

You could use the .deb from https://developer.download.nvidia.com/devtools/repos
This has the latest version of nsys. The CUDA repo usually lags behind a bit and is slower to get updated.

system · September 14, 2023, 3:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3542	October 20, 2022
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1885	January 12, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	9881	January 11, 2023
How to profile an application with Cuda 12.1 driver? Profiling Linux Targets	19	2859	July 18, 2023
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	32	8801	August 20, 2025
Nsys profiling does not contain CUDA kernel data Profiling Linux Targets kernel	3	1010	November 5, 2023
Nsys can't capture anything (cuda programs only) Profiling Linux Targets	14	323	July 10, 2025
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	4086	March 4, 2021
Sqlite does not contain CUDA kernel data CUDA on Windows Subsystem for Linux	12	4164	April 28, 2023
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3709	July 28, 2023

Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0)

Related topics