CUDA profiler documentation out of date

I’m not sure where to point this out, so I’m doing so here.

The documentation for the CUDA profiler is out-of-date, which means it is wrong in a number of places.

For example:

Here it states:
gld_throughput = ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime

But l2_subp0_read_requests, l2_subp1_read_requests and l1_local_ld_miss are incorrect. Only after much frustration and poking around nvprof did I work out that these should in fact be: l2_subp0_read_sector_queries, l2_subp1_read_sector_queries and l1_local_load_miss.

It would also be useful if standard units were supplied along with a lot of the metrics (e.g. GB/s or GiB/s or “instructions per second”). This would make it much easier for newcomers to understand what various metrics mean.

I’m sure I’m not the only one who has experienced this annoyance, so I wanted to know if there is a “proper” way of letting NVIDIA know of this issue?

Thanks for pointing out this error. In the upcoming 5.5 release nvprof is capable of collecting and reporting metric values just like it does for events. So you no longer need to decipher the formulas to collect the metric values. For example:

$ nvprof --metrics gld_throughput ./diverge
==25666== NVPROF is profiling process 25666, command: ./diverge
Device Name: Tesla C2050
==25666== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
==25666== Profiling application: ./diverge
==25666== Profiling result:
==25666== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device “Tesla C2050 (0)”
Kernel: VecEmpty(void)
4 gld_throughput Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
Kernel: VecThen(int*, int*, int*, int)
4 gld_throughput Global Load Throughput 9.5271GB/s 290.72GB/s 82.207GB/s

Because of this new support and due to the complexity of some of the metric calculations, the documentation no longer contains the metric formulas. If you are a registered developer you will be able to access the 5.5 release soon. As a registered developer you will also be able to file bug reports for issues like this. Let us know if you have issues using the nvprof metric support in 5.5.