Missing gld_32/64/128b counter in GTX 480?

fji · August 27, 2010, 9:32pm

Hi,
I just got my GTX 480 card and tried to run with the latest computeprof (v3.1), which used to be cudaprof. But I cannot find the global memory throughput (read/write/overall) on the summary table tab. This not only happens to my own app, but also the matrixMul which comes with the SDK. When I clicked ‘Session’->‘Global memory throughput’, I got this message:

Profiler counters ‘gld_32/64/128b’ required for global memory throughput calculations are not available.

Is this because the gld counter is dropped in GTX 480? The global mem throughput is quite useful to me actually.

laughingrice · August 28, 2010, 10:55am

It’s because the read/write goes via L1/L2 caches, so what you would see now is cache hit/miss

laughingrice · August 28, 2010, 10:55am

It’s because the read/write goes via L1/L2 caches, so what you would see now is cache hit/miss

fji · August 29, 2010, 3:08pm

Hi Laughingrice, Thank!

Since there is cache for global memory access now, should global memory throughput calculated as the following?

(L2 cache line size) x (number of L2 cache misses) / time

fji · August 29, 2010, 3:08pm

Hi Laughingrice, Thank!

Since there is cache for global memory access now, should global memory throughput calculated as the following?

(L2 cache line size) x (number of L2 cache misses) / time

paulius · August 30, 2010, 9:01pm

It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).

Paulius

paulius · August 30, 2010, 9:01pm

It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).

Paulius

Topic		Replies	Views
Memory throughput on GTX480 (cudaprof question) how to calculate memory throughput from GST/GLD CUDA Programming and Performance	4	7042	May 28, 2010
CUDA Command Line Profier - Calculating Global Memory Throughput CUDA Programming and Performance	0	5162	May 14, 2012
cuda profiler -> cannot get performance values problem with some profiler counters being skipped CUDA Programming and Performance	0	882	March 13, 2011
texture cache memory bandwidth CUDA Programming and Performance	1	970	May 27, 2010
gld counter - visual profiler question CUDA Programming and Performance	1	2266	June 12, 2009
Question about global mem throughput CUDA Programming and Performance	3	4854	April 20, 2010
Visual Profiler reports higher than possible global mem throughput CUDA Programming and Performance	2	855	July 30, 2010
Compute Visual Profiler- global memory throughput Legacy PGI Compilers	1	2847	April 14, 2011
problem about "gld 64b" reported by profilier the number is different from what I expect CUDA Programming and Performance	2	3824	June 12, 2010
Global memory throughput on various hardware CUDA Programming and Performance	1	1133	November 19, 2008

Missing gld_32/64/128b counter in GTX 480?

Related topics