Profiler coalescing counters On a GTX 260

I am yet to see the Visual Profiler give me any non-zero values for gld uncoalesced and gst uncoalesced, despite writing all my kernels with no thought to memory coalescing.

Reading the Programming Manual the description given in “Coalescing on Devices with Compute Capability 1.2 and Higher” seems to imply that I should be getting bundles of them.

Does this counter not work with the new coalescing rules, or is the compiler doing something clever?

In the pre-release CUDA for G200, the counters always read 0 because they weren’t updated to understand the new coalescing rules (where the number of memory transactions need to be counted). I’m not sure if they fixed this for CUDA 2.0b2 or not.

Found this:

"Known Issues

Due to improved memory coalescing hardware, the gld_incoherent
and gst_incoherent signals will always be zero on GTX 280 and GTX 260."

in the doc section of the install directory.

I think I shall go through my code thinking about it!

Thanks!

Just out of interest does this mean that there are no uncoalesced accesses due to the improved hardware (ie. the hardware somehow coalesces everything for you) or that the profiler just can’t show them because the hardware is different?

I’m fairly sure it’s the latter.

My understanding of the programming guide is that memory coalescence isn’t as vital as it is for older hardware (you can get partial coalescence), however it can help to reduce transactions.

Section 5.1.2.1:

"If a half-warp addresses words in n different segments, n memory transactions are

issued (one for each segment), whereas devices with lower compute capabilities

would issue 16 transactions as soon as n is greater than 1."