I would like to report an error we recently encountered with CUPTI profiling after updating our system:
OS RedHat Linux 7 → 9
CUDA Driver version 470.x.x → 530.30.02
GPU : NVIDIA A100-SXM4-80GB
This error can be reproduced by running the userrange_profiling
from the cuda/extras/CUPTI/samples
directory provided by CUDA Toolkit versions 11.0, 11.1, and 11.2, compiled with 11.0.x. Example run:
./userrange_profiling 0 dram__bytes_read.sum,sm__cycles_active.sum,smsp__warps_launched.sum
which causes the following output
Usage: ./userrange_profiling [device_num] [metric_names comma separated]
CUDA Device Number: 0
Compute Capability of Device: 8.0
Launching kernel: blocks 196, thread/block 256
[simplecuda.cu:262](http://simplecuda.cu:262/): error: function cuptiProfilerEndPass(&endPassParams) failed with error CUPTI_ERROR_UNKNOWN.```
This problem goes away when compiling with CUDA toolkit versions 11.1, 11.2, etc. Following is the output in that case
Usage: ./userrange_profiling [device_num] [metric_names comma separated]
CUDA Device Number: 0
Compute Capability of Device: 8.0
Launching kernel: blocks 196, thread/block 256
rangeName: userrangeA metricName: dram__bytes_read.sum gpuValue: 14464
rangeName: userrangeA metricName: sm__cycles_active.sum gpuValue: 342122
rangeName: userrangeA metricName: smsp__warps_launched.sum gpuValue: 3136```
Does the latest driver not support some feature from 11.0? Is this expected behavior?
Hi anustuv,
Could you please check exact 11.0 CUPTI lib version that is failing to us ? We have 3 minor CUDA version of CUDA 11.0.x .
ls -al /usr/local/cuda-11.0/extras/CUPTI/lib64/
We have cuda toolkit version 11.0.3 on which we are getting posted error. The following is the output of above ls -la
from “CUPTI/lib64”
total 52883
drwxr-sr-x 2 jenkins jenkins 9 May 20 01:20 .
drwxr-sr-x 6 jenkins jenkins 6 May 20 01:20 ..
lrwxrwxrwx 1 jenkins jenkins 16 May 20 01:20 libcupti.so -> libcupti.so.11.0
lrwxrwxrwx 1 jenkins jenkins 20 May 20 01:20 libcupti.so.11.0 -> libcupti.so.2020.1.1
-rwxr-xr-x 1 jenkins jenkins 6494000 May 20 01:20 libcupti.so.2020.1.1
-rw-r--r-- 1 jenkins jenkins 15396942 May 20 01:20 libcupti_static.a
-rwxr-xr-x 1 jenkins jenkins 11354232 May 20 01:20 libnvperf_host.so
-rw-r--r-- 1 jenkins jenkins 17154402 May 20 01:20 libnvperf_host_static.a
-rwxr-xr-x 1 jenkins jenkins 3487832 May 20 01:20 libnvperf_target.so
Thanks for the checking . Theoretically , this combination is supported since CUDA Driver is backward compatible with the CUDA Toolkit and so is CUPTI. Since later versions 11.1 and above are confirmed working as expected , can I know is this being any blocker to you on CUDA 11.0 ? Thanks .
Well, I am a developer with the PAPI team and am working on the CUDA profiling component there. We are providing support for Perfworks and events API with multithreading support. We are concerned with the versions that our users can possibly use and NVIDIA does claim backwards compatibility. We are interested to know about edge cases where profiling does not work. A definite answer regarding such would be enlightening.
Sorry for the delay . We will track you the account to access the internal bug ticket for further talk .
The latest status is I can see the repro on same configuration to yours , but it looks specific to RHEL 9 which doesn’t repro for me on Ubuntu 22.04 . Our engineering team will investigate further and get back to you in the bug ticket .