Hello ! I’m back to profiling kernels but I’m having some issues with the nsight compute tool. I have a big project with an extensive use of torch and some custom kernels that I run throught an nsight compute project.
After the program is launched it stops in the first API call cuGetProcAddress
, then I just select “Run to next kernel” as I usually do and it runs through a lot of API calls, these being: cuDeviceGetAttribute
, cuDeviceGetUuid
and cuDeviceGetLuid
. Then, inside an API call cudaGetDevice -> cuCtxGetDevice
I get the function return value CUDA_ERROR_INVALID_CONTEXT(201)
and the profiler pauses the program again.
I will press the “Run to next kernel” button and this issue will happen again a few times until I reach my desired kernel. However when I ask nsight compute to profile this kernel with full statistics, it fails with the following messages:
Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See … for more details.
Failed to prepare kernel for profiling
I suspect these two issues might be related and that it could be somehow also related with permissions. Some things I have checked:
- I always run nsight compute as administrator
- “Allow access to the GPU performance counters to all users” is ticked in the nvidia control panel
- DCGM is not recognized as an application and I dont see it running anywhere
- Restarting the computer in case there is something else blocking the GPU profiling
Im using windows 11 with cuda 11.8. My nvcc and nvidia-smi (I removed some sensible program names) outputs are:
$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$> nvidia-smi
nvidia-smi
Sat Nov 9 23:50:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.03 Driver Version: 566.03 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 WDDM | 00000000:01:00.0 On | N/A |
| 0% 53C P0 38W / 190W | 1745MiB / 6144MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 4180 C+G ...__8wekyb3d8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 4700 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 4752 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
| 0 N/A N/A 7480 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 8348 C+G ...nt.CBS_cw5n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 8372 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 11956 C+G ...werToys\PowerToys.ColorPickerUI.exe N/A |
| 0 N/A N/A 11992 C+G ...werToys\PowerToys.PowerLauncher.exe N/A |
| 0 N/A N/A 12236 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 15176 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
+-----------------------------------------------------------------------------------------+