Missing gld_32/64/128b counter in GTX 480?

Hi,
I just got my GTX 480 card and tried to run with the latest computeprof (v3.1), which used to be cudaprof. But I cannot find the global memory throughput (read/write/overall) on the summary table tab. This not only happens to my own app, but also the matrixMul which comes with the SDK. When I clicked ‘Session’->‘Global memory throughput’, I got this message:

Profiler counters ‘gld_32/64/128b’ required for global memory throughput calculations are not available.

Is this because the gld counter is dropped in GTX 480? The global mem throughput is quite useful to me actually.

It’s because the read/write goes via L1/L2 caches, so what you would see now is cache hit/miss

It’s because the read/write goes via L1/L2 caches, so what you would see now is cache hit/miss

Hi Laughingrice, Thank!

Since there is cache for global memory access now, should global memory throughput calculated as the following?

(L2 cache line size) x (number of L2 cache misses) / time

Hi Laughingrice, Thank!

Since there is cache for global memory access now, should global memory throughput calculated as the following?

(L2 cache line size) x (number of L2 cache misses) / time

It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).

Paulius

It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).

Paulius