I have a titan Volta GPU.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:C1:00.0 Off | N/A |
| 34% 45C P8 N/A / N/A | 2MiB / 1999MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA TITAN V Off | 00000000:E1:00.0 Off | N/A |
| 32% 47C P8 28W / 250W | 4MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
I am trying to profile some code. For sake of brevity, let’s consider a simple vector add code I picked up here https://www.olcf.ornl.gov/tutorials/cuda-vector-addition/.
nvprof works fine
nvprof --devices 1 ./a.out
==208985== NVPROF is profiling process 208985, command: ./a.out
final result: 1.000000
==208985== Profiling application: ./a.out
==208985== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 68.54% 145.57us 2 72.784us 68.160us 77.408us [CUDA memcpy HtoD]
29.17% 61.952us 1 61.952us 61.952us 61.952us [CUDA memcpy DtoH]
2.29% 4.8640us 1 4.8640us 4.8640us 4.8640us vecAdd(double*, double*, double*, int)
API calls: 98.08% 211.47ms 3 70.491ms 11.650us 211.27ms cudaMalloc
0.64% 1.3889ms 3 462.96us 14.429us 1.2802ms cudaFree
0.42% 910.12us 3 303.37us 116.25us 629.10us cudaMemcpy
0.41% 882.48us 2 441.24us 150.76us 731.72us cuDeviceTotalMem
0.38% 817.99us 202 4.0490us 350ns 175.41us cuDeviceGetAttribute
0.04% 96.391us 2 48.195us 36.210us 60.181us cuDeviceGetName
0.01% 29.326us 1 29.326us 29.326us 29.326us cudaLaunchKernel
0.01% 12.221us 2 6.1100us 2.8180us 9.4030us cuDeviceGetPCIBusId
0.00% 3.3020us 4 825ns 328ns 1.8740us cuDeviceGet
0.00% 3.2020us 3 1.0670us 558ns 2.0160us cuDeviceGetCount
0.00% 1.1250us 2 562ns 468ns 657ns cuDeviceGetUuid
ncu however does not detect any kernels.
ncu --devices 1 ./a.out
==PROF== Connected to process 209016 (/home/adityap/a.out)
final result: 1.000000
==PROF== Disconnected from process 209016
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.
I tried installing older CUDA versions. As I am on debian, the ubuntu packages seem to be broken. The run file errors for driver. If I install only CUDA toolkit(without the driver), it installs, but ncu doesn’t work nevertheless.
The same GPU worked fine on another machine, the issue only occured when we moved it to another machine. So my guess is that there is some specific version of CUDA toolkit + driver that works for this device. Is anyone aware of it?