Hi,
I just got my GTX 480 card and tried to run with the latest computeprof (v3.1), which used to be cudaprof. But I cannot find the global memory throughput (read/write/overall) on the summary table tab. This not only happens to my own app, but also the matrixMul which comes with the SDK. When I clicked ‘Session’->‘Global memory throughput’, I got this message:
Profiler counters ‘gld_32/64/128b’ required for global memory throughput calculations are not available.
Is this because the gld counter is dropped in GTX 480? The global mem throughput is quite useful to me actually.
It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).
It depends on what you want. From the program point of view, it doesn’t really matter whether the data is coming from L1, L2, or memory. So, in that case I’d simply count the l1 requests for the memory throughput. Of course, if you’re hitting a lot in one of the caches, the resulting throughput may exceed the theoretical memory bandwidth (but you’ll know why).