How to profile an application with Cuda 12.1 driver?

Hello,

I have the following gpu and driver:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5500 Laptop GPU     Off| 00000000:01:00.0  On |                  Off |
| N/A   65C    P1               62W /  N/A|   8896MiB / 16384MiB |     63%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Cuda was installed with the latest version of sdkmanager:

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:10:41_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

When I try to profile an application with nsight systems 2022.3.1, the tool can’t collect Cuda data:

Warning	Injection	6016	00:00.210	
Installed CUDA driver version (12.1) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 11.8
Information	Injection	6016	00:00.210	
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Information	Injection	6016	00:00.337	
CUDA injection initialized successfully.
Information	Analysis		00:02.002	
Profiling has stopped.
Information	Daemon		00:02.211	
Number of IP samples collected: 5.935.
Warning	Daemon	5692	00:02.401	
Failed to connect to the application. Has it been run with Injection library?
Warning	Analysis	6016	00:04.530	
Not all CUDA events might have been collected.
Warning	Analysis	6016	00:04.530	
No CUDA events collected. Does the process use CUDA?
Warning	Analysis		00:04.530	
CUDA driver version (12.1) is not supported, using libraries for older driver version. Check for updates to see if there is a newer version available.

and the profiling process stops.

I installed the latest nsight systems 2023.2.1 - it still can’t collect Cuda traces and produces the following log:

Information	Injection	7138	00:00.411	
CUDA injection initialized successfully.
Warning	Analysis	7138	00:00.621	
Not all CUDA events might have been collected.
Warning	Analysis	7138	00:00.621	
No CUDA events collected. Does the process use CUDA?
Information	Analysis		00:10.437	
Profiling has stopped.
Error	Injection	7138	00:10.457	
ActivityFlushAll returned 2: CUPTI_ERROR_INVALID_DEVICE
Information	Injection	7138	00:10.457	
Number of CUPTI events produced: 0, CUPTI buffers: 50.
Information	Daemon		00:10.582	
Number of IP samples collected: 24.553.

What is CUPTI_ERROR_INVALID_DEVICE and how to fix that?

Thanks for reporting the problem, nikita. Could you tell me which target platform and OS are you profiling on here?

The 2022.3 nsys version does not have support for your CUDA driver version, so the diagnostic error is expected. However, the 2023.2 nsys version should work. Could you share the report file where you see CUPTI_ERROR_INVALID_DEVICE error?

@skottapalli I am on Ubintu 20.04.6. Which report file do you mean? Above I posted the log from nsys.

The nsys-rep file. If you don’t want to share it on the forum. You can private message it to me. It contains additional details that might help me debug.

Here’s the report of the matrixMul sample:
report1.nsys-rep (307.7 KB)

To continue debugging, could you please collect and share logs for this profiling? To collect logs please follow these steps:

  1. Save the following content to /tmp/nvlog.config:
+ 100iwef global

$ /tmp/nsight-sys.log

ForceFlush

Format $sevc$time|${name:0}|${tid:5}|${file:0}:${line:0}[${sfunc:0}]:$text
  1. Add --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config to your Nsys command line. E.g. nsys profile --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config -s none --cpuctxsw=none --trace=cuda <app>
  2. Run a collection. There should be logs at /tmp/nsight-sys.log. Share this file.

Here’s the recorded nsight-sys.log:

nsight-sys.log (45.0 KB)

Thanks for providing the log, Nikita. Could you please run the following steps to collect more logs for further debugging?

Download the tar.gz package, extract it, cd cuda-injection-library and run make. It will build a simple injection library to identify the root cause of the problem on your machine.

From the cuda-injection-library folder, please run
LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64 CUDA_INJECTION64_PATH=./libToolsInjectionCuda.so /path/to/matrixMul >log.txt 2>&1

The output on my machine is attached here. Please share the output file log.txt file from your machine. It will help with debugging why you are seeing the CUPTI_ERROR_INVALID_DEVICE error.

cuda-injection-library.tar.gz (544.4 KB)
log.txt (95.1 KB)

I do not have /usr/local/cuda/extras/CUPTI/lib64 folder in my installation (which I did with the sdkmanager). The CUPTI directory has only doc and samples inside. I have a bunch of libcupti.so.X.X scattered over multiple different directories though. Do I need some specific version to run the injection?

I see. No worries. Could you try setting the LD_LIBRARY_PATH to the nsys target folder instead? It has the required libcupti version. On my machine it is
LD_LIBRARY_PATH=/opt/nvidia/nsight-systems/2023.2.0/target-linux-x64/ CUDA_INJECTION64_PATH=./libToolsInjectionCuda.so /path/to/matrixMul >log.txt 2>&1

The command above just needs the path to the cupti version 12.1 because you have CUDA 12.1 on your target system.

@nikita14 - update on our side. We are able to reproduce the bug with CUDA 12.1 driver, but not with the latest CUDA 12.2 driver. Are you able to upgrade the driver on your laptop?

@skottapalli Yes, I updated to the current latest driver 535.54.03 with Cuda 12.2. Nevertheless, I am still not able to profile an application - with the latest NsightSystems 2023.2.1 and matrixMul official sample I see the following warning:

CUDA driver version on the target (12.2) is not supported by this build of Nsight Systems.
CUDA trace will be collected using libraries for older driver version but some features might be missing or work incorrectly.
Check for updates to see if there is a newer version available.

And the Cuda profiling traces are not collected.

Thanks for trying and apologies it is taking longer to debug this problem since we cannot reproduce it in house. Could you please run the steps given below and share the log?

Download the tar.gz package (this is newer compared to the one I gave you before), extract it, cd cuda-injection-library and run make. It will build a simple injection library to identify the root cause of the problem on your machine.
cuda-injection-library.tar.gz (2.6 MB)

Copy the libcupti.so.12.1 from nsys installation directory (for example, I have mine at /opt/nvidia/nsight-systems/2023.2.1/target-linux-x64) to the cuda-injection-library and rename the binary as libcupti.so.12

From the cuda-injection-library folder, please run
LD_LIBRARY_PATH=. CUDA_INJECTION64_PATH=./libToolsInjectionCuda.so /path/to/matrixMul >file.log 2>&1

Please share the output file file.log file from your machine. It will help with debugging why you are seeing the CUPTI_ERROR_INVALID_DEVICE error.

I do not have CUPTI_ERROR_INVALID_DEVICE after I updated the driver. But the nsys can’t collect Cuda event - see the screenshot above. Anyway, here’s the log:

file.log (94.3 KB)

Okay, thanks for trying again. The file.log that you shared shows that CUPTI is working as expected. The nsys version 2023.2 you downloaded from the website does not have the full support for CUDA 12.2 driver (as shown in the diagnostic errors/warnings), but it should still get you CUDA traces. I don’t know why it doesn’t work on your system.

To debug further, could you get the CUDA toolkit 12.2? It includes the nsys that has full support for CUDA 12.2. See /usr/local/cuda-12.2/nsight-systems-2023.2.3
Could you run another collection with it by following the steps below to collect the injection log and the report file .nsys-rep?

  1. Save the following content to /tmp/nvlog.config:
+ 100iwef global

$ /tmp/nsight-sys.log

ForceFlush

Format $sevc$time|${name:0}|${tid:5}|${file:0}:${line:0}[${sfunc:0}]:$text
  1. Add --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config to your Nsys command line. E.g. nsys profile --env-var=NVLOG_CONFIG_FILE=/tmp/nvlog.config -s none --cpuctxsw=none --trace=cuda <app>
  2. Run a collection. There should be logs at /tmp/nsight-sys.log. Share this file and the nsys-rep file.

Sorry, but it’s problematic for me to install CUDA manually. On my work we agreed to install CUDA only with sdkmanager so all the team has the same software. At the moment of writing the current toolkit version available through the sdkmanager is 11.4. Also, there’s no nsight-systems 2023.2.3 on the official page - the latest available is 2023.2.1.

As I understand, the problem is fixed with the newer toolkit and nsight-systems. As already mentioned, it’s problematic to install the latest version of the software, so I suppose I have to wait until it’s be available in the sdkmanager. Meanwhile I can rollback the driver to 515 where the profiling is working.

I understand the constraints. You could get the nsys from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb

or if you have the CUDA repos for apt ,
apt install nsight-systems-2023.2.3 should also install 2023.2.3

Thank you, I installed 2023.2.3. The warning regarding the mismatch between Nsight Systems and CUDA driver now gone, but still no traces are collected. Here’s the log:

nsight-sys.log (44.7 KB)

The log really does not indicate why there are no traces collected. Could you share the .nsys-rep file corresponding to the nsight-sys.log file that you shared?

To debug this problem, would it be possible for me to access your system? I can get on a Microsoft Teams call so that you are present while I do the debugging on your system. If this is not an option, then I will need to think more about how to debug this.

Sure, we can debug it together on my system. And here’s the .nsys-rep:
report.nsys-rep (307.2 KB)