problem about "gld 64b" reported by profilier the number is different from what I expect

dear all,

I’m using CUDA on C1060 GPU.
I wrote a kernel which I expected to read (3^14 * 2) of float from global memory.

the accesses were almost perfect coalesced.
so I expected the kernel generate (3^14 * 2 * 4)/64 = 597871 of 64-byte transactions.

but what I found in visual profilier is 59552 of gld_64,
which is only around 1/10 of my expected value.

is there any suggestion for me?
does gld_64 counter realy reflect the absolute number of memory transactions. Or, it is just relative values.

Thank you very much

Correct me if I’m wrong, but I believe the profiler only tracks a single SM or TPC. I’m not sure…

N.

Hi, Nico,

I believe that u are right~~

C1060 does contain 10 TPCs.

Thank you for replying :)