As part of a machine learning project, we are optimizing some custom CUDA kernels.
We are trying to profile them using Nsight Compute, but encounter the following error running on the LHR RTX 3080 when running a simple wrapper around the CUDA Kernel:
> ==ERROR== Failed to access the following 4 metrics: dram__cycles_active.avg.pct_of_peak_sustained_elapsed, dram__cycles_elapsed.avg.per_second, gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed, gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
==ERROR== Failed to profile kernel "kernel" in process 20204
Running a diff against the metrics available on an RTX 3080 TI (non-LHR) vs an RTX-3080 (LHR) via nv-nsight-cu-cli --devices 0 --query-metrics
, We notice the following metrics are missing in the RTX 3080 LHR version:
gpu__compute_memory_request_throughput
gpu__compute_memory_throughput
gpu__dram_throughput
All of these are required for even basic memory profiling using Nsight Compute. All other metrics are correctly present, except for these. Is this a limitation of LHR cards? Why would they not be present?
Details:
- Cuda Version: 11.5
- Driver version: 497.29.
- Windows 10