Unexpected Profiler output, zeros for all global read/write

I am performing some simple calculations, using global memory. I am calling cudaMalloc and cuda memcpying the data. I am then passing in pointers to the device memory to my kernal call. I am seeing the kernals manipulating the data correctly as I then copy the results back to the CPU. However, when I look at the profilier, I see
gld coalesced, gld uncoalesced, gst coalesced, gst uncoalesced all equal to zero. I am using profilier 1.1 to start the session. Any ideas why the counters all show zero when I am pretty sure each kernel is doing many global reads.

Thanks,

I’m getting the same problem. Some calls report 0 for all four of these counters, some others don’t. I can’t trust the ones that aren’t zero because of this.

I’m going to try 2.1 beta, to see if it fixes this.

EDIT: 2.1b exhibits the same issue

Are either/both of you using a GTX 260 or 280? The newer hardware handles memory coalescing in a very different way (see the guide) which causes these counters to return zero all the time. It’s kind of annoying, but a known issue. All the other profiler counters are still correct.

I’m using the 8800 GTX, so… nope.