Visual profiler settings

Hi, in my cuda visual profiler is impossible to visualize uncoalesced memory access, because is impossible to set it in the session settings. Anyone has an idea? Where I can find a good tutorial or a user’s guide?Thanks a lot.

If you are using a compute 1.2 or compute 1.3 capability device, that is normal. In those cards you get access to individual 32, 64, and 128 byte load and store counters, which provides more information than the old coalesced/uncoalesced counters ever did. The GT200 family has a number of additional global memory read and write modes which blurs the line between pure coalesced and pure serialized memory access, which I think is the reason that they have been deprecated on the most recent generation of hardware.

so, I can’t have uncoalesced access? Sorry, but i don’t understand what are the information that can I have from the individual access…

P.S. My card is a gtx280.

Thank you in advance.

For a GTX280, no you can’t have the old uncoalesced counter. For what the memory access counters mean, have a look a the section entitled “Coalescing on Devices with Compute Capability 1.2 and Higher” in Chapter 5 of the CUDA programming guide.


thank you very much. Please, I’ve just another question: in the profiler gst_coalesced is referred to 16-byte data?

I think it applied to any half-warp memory request serviced in a single 16-word memory transaction. So it could be 32, 64 or 128 bytes, depending on what the kernel was doing. If the kernel loaded or stored more than one word length type (for example each thread wrote a float and float2), then it would be the sum of the coalesced float and float2 stores.

If I have this equivalence gst_coalesced=gst_322+gst_644+gst_128*8, can I say that I haven’t uncoalesced acces?

Sorry I don’t understand what you are asking, but I suspect that only someone who has seen and analysed the code in question can answer it.