Availability issue for GPU Metrics sampling hardware unit on WSL

dproksch · June 2, 2023, 5:04pm

Hi,

I’m trying to profile a CUDA-SYCL-based application in WSL2 (Ubuntu 20.04.6) on Win10 (with Insider Program) using Nsight Systems 2022.4.2.1 and I’m having some issues with GPU Metrics (on an RTX 3060). The profiling for the CPU works fine and doesn’t throw any errors, but then the reports contain this Daemon Error:

GPU Metrics [0]: GPU metrics sampling hardware unit is already in use by another instance of Nsight Systems or other tool. The conflict can occur within the OS as well as containers, VMs and hypervisor.
- API function: NVPW_Device_PeriodicSampler_GetCounterAvailability(¶ms)
- Error code: 20
- Source function: static std::vector QuadDDaemon::EventSource::GpuMetricsBackend::Impl::CounterConfig::GetCounterAvailabilityImage(uint32_t)
- Source location: /build/agent/work/323cb361ab84164c/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetricsBackend.cpp:587

It’s important to say that I am using CUDA 11.8 and a CUDA driver version of 12.1, which is noted as a warning in the report:

Installed CUDA driver version (12.1) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 11.8

Due to the nature of the frameworks I’m using, I’d prefer to not update the CUDA version at this current moment, if possible. There are no other instances of Nsight Systems or Compute running while I’m generating the reports.

Do you know where this issue could possibly come from and how I could fix it? Thank you in advance.

Nvidia-smi output:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.51.01              Driver Version: 532.03       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060         On | 00000000:05:00.0  On |                  N/A |
| 40%   31C    P8               10W / 170W|   1478MiB / 12288MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        37      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

hwilper · June 20, 2023, 2:21pm

@pkovalenko & Jason - is this a WSL2 issue?

864832769 · July 25, 2023, 6:33am

I encountered the same problem, did you solve it now?

dproksch · July 25, 2023, 10:33am

No, sadly I was not able to solve it so far. I didn’t really look into it much later, as I was able to profile directly on our GPU cluster, which was my goal in the first place. Profiling locally on WSL would have been nice for crosschecks and additional data, but it doesn’t matter much.

So, no, sry I can’t help you. I hope you find a solution.

864832769 · July 25, 2023, 2:37pm

Thank you bro, wish you a pleasant day.

hwilper · July 25, 2023, 3:27pm

I’ve recently responded to another user who hit a problem specifically with the version shipped in 11.8, so I am going to recommend loading the newest Nsys version (which worked for him).

864832769 · July 28, 2023, 12:23pm

I have already download the Nsight Systems 2023.2.1 (Windows Host)，but it still can’t work.

hwilper · July 28, 2023, 2:11pm

I will ping @jasoncohen directly.

ruipeterpan · June 11, 2024, 5:17pm

Rebooting resolved this issue for me

andor233 · June 26, 2024, 2:41pm

I also met the same question in WSL2(Ubuntu 22.04). The profiling for the CPU works fine and doesn’t throw any errors, but then the reports contain this Daemon Error:

GPU Metrics [0]: GPU Metrics sampling hardware unit is already in use by another instance of Nsight Systems or other tool. The conflict can occur within the OS as well as containers, VMs and hypervisor.
- API function: Nvpw.GPU_PeriodicSampler_GetCounterAvailability(&params)
- Error code: 20
- Source function: virtual std::vector<unsigned char> QuadDDaemon::EventSource::{anonymous}::GpuPeriodicSampler::GetCounterAvailabilityImage() const
- Source location: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetrics.cpp:135

The warning of CUDA trace in the report:

CUDA driver version on the target (12.4) is not supported by this build of Nsight Systems.
CUDA trace will be collected using libraries for older driver version but some features might be missing or work incorrectly.
Check for updates to see if there is a newer version available.

Do you have solved this problem?

Topic		Replies	Views
GPU Metrics Unit Already in Use Error and Slingshot-11 NIC Metrics Profiling Linux Targets	8	760	April 4, 2024
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	579	April 18, 2024
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3603	March 4, 2021
Nsight Systems Issue: Unable to configure the collection of CPU IP samples Profiling Linux Targets	12	8798	December 27, 2021
CUDA on WSL 2 Ubuntu 20.04 unable to detect any GPUs CUDA Setup and Installation	4	3146	March 29, 2021
Cannot get tensor core metrics with latest NSight system Profiling Linux Targets cuda , profiling	4	1418	June 20, 2023
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1790	March 18, 2024
[Nsights system] GPU metric not supported on RTX 3090 Ti Profiling Linux Targets cuda , nsight	1	649	January 10, 2024
Gpu-metrics-set not found for GH200 Profiling Linux Targets	6	196	August 15, 2024
Can't get GPU Metrics with nsight-system Profiling Linux Targets cuda , kernel	7	3079	June 14, 2024

Availability issue for GPU Metrics sampling hardware unit on WSL

Related topics