Error with CUPTI when profiling CUDA kernel written using Numba

Hi everyone,

I’m a new user of CUDA and I’m trying to profile a kernel written using Numba.
I attach the python script
to_profile.py.txt (2.7 KB)

I then launch the NVIDIA Nsight System with the command:

/opt/nvidia/nsight-systems/2023.3.3/bin/nsys  profile -o /path/to/output/folder/profiling_report.nsys-rep --force-overwrite true --trace cuda --python-sampling=true python3 /path/to/script/to_profile.py

When I open the .nsys-rep file using the NVIDIA Nsight System GUI, I see the error

Could not parse 2 CUPTI activity records. Please try updating the CUDA driver or use more recent profiler version.

I’m running on Mint 23 on a Laptop with a Nvidia T550 GPU. The nvidia-smi output is the following

If you need any information about my system or the CUDA I have installed please ask.

Moving this to the Nsight Systems category.

Are you seeing any CUDA information in the GUI?

Thank you for the answer!
This is everything I see from the Nsight System GUI

The CUDA version I have installed is 12.3, but I saw right now that in python I have installed

cuda-profiler-api         11.8.86                       0    nvidia
cuda-python               11.8.3          py310h1b7760a_1    conda-forge
cuda-version              11.8                 h70ddcb2_3    conda-forge
cudatoolkit               11.8.0               h6a678d5_0 
nvidia-cublas-cu12        12.3.4.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.3.101                 pypi_0    pypi
nvidia-cuda-nvcc-cu12     12.3.107                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.3.107                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.3.101                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.7.29                 pypi_0    pypi
nvidia-cufft-cu12         11.0.12.1                pypi_0    pypi
nvidia-cusolver-cu12      11.5.4.101               pypi_0    pypi
nvidia-cusparse-cu12      12.2.0.103               pypi_0    pypi
nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi

May this be the problem?

If you change the dropdown from Diagnostic Summary to the Timeline, what do you see?

Nsight Systems uses CUPTI under the covers to get CUDA information. I suppose it is possible that your system tried to use the old CUPTI with a newer Nsys and gave this result.

Here’s the Timeline view

Okay, it looks like you got most of the CUDA events and there were just a couple that were removed from the data that we got from CUPTI. There are multiple reasons that that could happen, but if we can use what you have collected, we got 403 events and failed on two records, this should be good enough to give you some profiling information.

This is a very short program and it looks like most of the work is happening before the half second mark. I would suggest you zoom into that range from about 200-400 ms where you see the blips of activity and examine what is going on there. If you click on the arrow by the “CUDA HW …” line, you will expand the activity on that GPU and should be able to see more information about GPU activity.

Here is a blog post that I wrote about optimizing memory transfers, but it has a lot of general information about navigating Nsys that might be helpful to you. Optimizing CUDA Memory Transfers with NVIDIA Nsight Systems | NVIDIA Technical Blog