I’m using NVIDIA Nsight Systems version 2023.1.2.43-32377213v0 to profile a GPU run on an Geforce RTX 3080 Ti laptop GPU in the following way.
$ nsys profile -s none --cpuctxsw none --trace=cuda -o gpu_ <app>
...
Generating '/tmp/nsys-report-9419.qdstrm'
[1/1] [========================100%] gpu_.nsys-rep
$ nsys stats --report gpukernsum gpu_.nsys-rep
Processing [gpu_.sqlite] with [/opt/nvidia/nsight-systems/2023.1.2/host-linux-x64/reports/gpukernsum.py]...
SKIPPED: gpu_.sqlite does not contain CUDA kernel data.
However, as shown above, nsys doesn’t collect any GPU kernel data in the profile. I can confirm that the app is definitely running on the GPU as evidenced by nvidia-smi’s output (I’m watching it with watch -n 0.2 nvidia-smi and I can see the app running, the GPU memory and compute utilization increasing/changing constantly and its temperature/power also reflecting the run.)
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2289 G /usr/lib/xorg/Xorg 133MiB |
±--------------------------------------------------------------------------------------+
Could you open the report with the Nsight Systems GUI and check if there are Diagnostic messages related to CUDA? You can check this in the Diagnostics Summary view.
Is it possible to share the report?
Could you provide some more information about the application? Is it a python application?
Could you run one of the CUDA samples and see if CUDA traces are collected for that?
The reason for the gpukernsum output is that the specific CUDA sample, bandwidthTest, does not launch kernels. It mainly allocates memory on the GPU and transfers data between the host and the device, or between devices.
If you use other available scripts, you will be able to see CUDA activity.
E.g., nsys stats --report cuda_api_sum gpu_.nsys-rep
You can study the code for the specific CUDA sample following this Github link.
I’m using CUDA 12.1 (as opposed to 11.8 in your setup). Also, it’s a 3080 Ti Laptop GPU that I have here.
NVIDIA GeForce RTX 3080 Ti Laptop GPU
Driver 530.30.02
CUDA 12.1
Nsight Systems 2023.1.2.43-32377213v0
Ubuntu 22.04 LTS
There is nothing special in my setup. I’m running stock Ubuntu 22.04. No VMs.
No CUDA traces are seen on profiling anything IIUC; the output of nsys stats on the generated report is always empty.
2023.2.1 is not yet available via apt from NVIDIA’s repos; so I can’t readily install it. I’d like to get it working on this setup as much as possible.
Followed the remaining steps you suggested to collect logs; attached.
Reg. the BETA driver: I didn’t understand the question. Am I using a non-stable version of some driver? All of the packages are installed from official NVIDIA Ubuntu repos. nsight-sys.log (41.2 KB)
Thanks for providing the log file! I can’t see anything that leads to the cause of the issue.
The next step would be to try Nsight Compute, ncu, and see if that tool is able to trace CUDA. If you have installed the CUDA toolkit you should have ncu already on your device. Otherwise you can install it with sudo apt install cuda-nsight-compute-12-1.
You can try to profile vectorAdd with ncu, e.g., ncu -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd. You could open the collected report with the Nsight Compute GUI to verify that CUDA traces were collected. Or share the report here.
All of the packages are installed from official NVIDIA Ubuntu repos.
I agree that 530.30.02 is the version distributed by the repo, you could get the latest recommended driver version from the following link, Official Drivers | NVIDIA
The currently installed driver though should not be an issue. Nsight Systems is expected to work on your setup.
The latest version of Nsight Systems can be found at this link. You may need to create a free account to access the content.
$ ncu --target-processes all /usr/local/cuda-11.8/extras/demo_suite/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
==PROF== Connected to process 77986 (/usr/local/cuda-11.8/extras/demo_suite/bandwidthTest)
Device 0: NVIDIA GeForce RTX 3080 Ti Laptop GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11526.5
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12849.3
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 420266.4
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 77986
==WARNING== No kernels were profiled.```
This is expected behavior, since bandwidthTest does not launch any kernels.
Please run ncu -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd and provide the report file vectorAdd_profile.ncu-rep here. Or the CLI output, if a report is not created.
By the way, Nsight Systems 2023.2.3, nsight-systems-2023.2.3, is available through apt, if you want to give the latest nsys version a try.
Thanks. I just realized that a bit later. Here’s the result of: ncu -f -o vectorAdd_profile /usr/local/cuda-11.8/extras/demo_suite/vectorAdd
[Vector addition of 50000 elements]
==PROF== Connected to process 181163 (/usr/local/cuda-11.8/extras/demo_suite/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==PROF== Profiling "vectorAdd" - 0: 0%....50%....100% - 9 passes
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==PROF== Disconnected from process 181163
==PROF== Report: /home/uday/vectorAdd_profile.ncu-rep
Note: I had to run with elevated privileges to avoid immediately rebooting for the module setting on perf counter permissions to take effect.
Hi @uday1, we followed up with the CUPTI team internally and they have confirmed that it is a bug in CUPTI. From CUPTI 11.8 onwards, the support for “Geforce RTX 3080 Ti laptop GPU” was broken. They are working on a fix which will be made available in a future release of nsys. In the meantime, you could go back to the CUDA 11.7 driver (version 515) and use the latest nsys version, if needed.
Sorry for the inconvenience so far. I understand it has been frustrating. Your help with debugging is greatly appreciated.
We have a new version of nsys that contains the fix from CUPTI Nsight Systems | NVIDIA Developer
Please try it out and let us know if you are still running into problems.