RTX 3080 LHR Missing gpu__dram_throughput CUDA metric

As part of a machine learning project, we are optimizing some custom CUDA kernels.

We are trying to profile them using Nsight Compute, but encounter the following error running on the LHR RTX 3080 when running a simple wrapper around the CUDA Kernel:


> ==ERROR== Failed to access the following 4 metrics: dram__cycles_active.avg.pct_of_peak_sustained_elapsed, dram__cycles_elapsed.avg.per_second, gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed, gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed

==ERROR== Failed to profile kernel "kernel" in process 20204

Running a diff against the metrics available on an RTX 3080 TI (non-LHR) vs an RTX-3080 (LHR) via nv-nsight-cu-cli --devices 0 --query-metrics , We notice the following metrics are missing in the RTX 3080 LHR version:

gpu__compute_memory_request_throughput
gpu__compute_memory_throughput
gpu__dram_throughput

All of these are required for even basic memory profiling using Nsight Compute. All other metrics are correctly present, except for these. Is this a limitation of LHR cards? Why would they not be present?

Details:

  • Cuda Version: 11.5
  • Driver version: 497.29.
  • Windows 10