RTX 3080 LHR Missing gpu__dram_throughput CUDA metric

As part of a machine learning project, we are optimizing some custom CUDA kernels.

We are trying to profile them using Nsight Compute, but encounter the following error running on the LHR RTX 3080 when running a simple wrapper around the CUDA Kernel:


> ==ERROR== Failed to access the following 4 metrics: dram__cycles_active.avg.pct_of_peak_sustained_elapsed, dram__cycles_elapsed.avg.per_second, gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed, gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed

==ERROR== Failed to profile kernel "kernel" in process 20204

Running a diff against the metrics available on an RTX 3080 TI (non-LHR) vs an RTX-3080 (LHR) via nv-nsight-cu-cli --devices 0 --query-metrics , We notice the following metrics are missing in the RTX 3080 LHR version:

gpu__compute_memory_request_throughput
gpu__compute_memory_throughput
gpu__dram_throughput

All of these are required for even basic memory profiling using Nsight Compute. All other metrics are correctly present, except for these. Is this a limitation of LHR cards? Why would they not be present?

Details:

  • Cuda Version: 11.5
  • Driver version: 497.29.
  • Windows 10

+Bumping up!

Thanks for submitting this issue. We are actively investigating what’s going on. I will update you as soon as I have more information.

We recently released CUDA 11.6. Are you able to install that newest version to see if the issue still reproduces? We are still trying to reproduce the issue in out lab. Thanks.

The upgrade to 11.6 Was able to resolve the issue. This is working correctly.

Successful conditions:
Gigabyte RTX 3080 10G Turbo (LHR)
CUDA 11.6
Driver version 511.23
Windows 10

1 Like