CUDA profiler documentation out of date

dmc2 · May 1, 2013, 8:46pm

I’m not sure where to point this out, so I’m doing so here.

The documentation for the CUDA profiler is out-of-date, which means it is wrong in a number of places.

For example: Profiler :: CUDA Toolkit Documentation

Here it states:
gld_throughput = ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime

But l2_subp0_read_requests, l2_subp1_read_requests and l1_local_ld_miss are incorrect. Only after much frustration and poking around nvprof did I work out that these should in fact be: l2_subp0_read_sector_queries, l2_subp1_read_sector_queries and l1_local_load_miss.

It would also be useful if standard units were supplied along with a lot of the metrics (e.g. GB/s or GiB/s or “instructions per second”). This would make it much easier for newcomers to understand what various metrics mean.

I’m sure I’m not the only one who has experienced this annoyance, so I wanted to know if there is a “proper” way of letting NVIDIA know of this issue?

David_Goodwin · May 9, 2013, 12:08am

Thanks for pointing out this error. In the upcoming 5.5 release nvprof is capable of collecting and reporting metric values just like it does for events. So you no longer need to decipher the formulas to collect the metric values. For example:

$ nvprof --metrics gld_throughput ./diverge
==25666== NVPROF is profiling process 25666, command: ./diverge
Device Name: Tesla C2050
==25666== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
==25666== Profiling application: ./diverge
==25666== Profiling result:
==25666== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device “Tesla C2050 (0)”
Kernel: VecEmpty(void)
4 gld_throughput Global Load Throughput 0.00000B/s 0.00000B/s 0.00000B/s
Kernel: VecThen(int*, int*, int*, int)
4 gld_throughput Global Load Throughput 9.5271GB/s 290.72GB/s 82.207GB/s

Because of this new support and due to the complexity of some of the metric calculations, the documentation no longer contains the metric formulas. If you are a registered developer you will be able to access the 5.5 release soon. As a registered developer you will also be able to file bug reports for issues like this. Let us know if you have issues using the nvprof metric support in 5.5.

Topic		Replies	Views
calculating gst_throughput and gld_throughput with nvprof Visual Profiler and nvprof	0	2080	March 23, 2013
nvprof warning: Metric "gld_throughput" cannot be found on device 0 CUDA Programming and Performance	6	3087	September 26, 2015
cuda profiler -> cannot get performance values problem with some profiler counters being skipped CUDA Programming and Performance	0	891	March 13, 2011
CUDA Profiler documentation Few questions and some interesting facts CUDA Programming and Performance	5	6175	July 20, 2009
Updated beta visual profiler v0.2 CUDA Programming and Performance	0	2913	April 23, 2008
Unable To Calculate All Metrics Visual Profiler and nvprof	3	4678	November 21, 2013
CUDA Command Line Profier - Calculating Global Memory Throughput CUDA Programming and Performance	0	5170	May 14, 2012
preview of NVIDIA Visual Profiler CUDA Programming and Performance	76	89345	May 18, 2010
VisualProfiler ver 2.2 CUDA Programming and Performance	13	4935	April 10, 2009
CUDA Profiler Feature Requests Nice to have... CUDA Programming and Performance	1	877	June 5, 2010

CUDA profiler documentation out of date

Related topics