Coalescing of cached constant memory

Hi,

coalescing applies to constant memory? To my understanding it does. My main question is whether it applies to cached memory too.

Thanks,
Evangelia

No. If different threads read different locations in constant memory, accesses get serialized.

I see, I observed much better performance when memory is coalesced for constant memory and I thought since constant memory is part of global memory (being cached), this was justified.

Thanks,

Eva

You might actually be right on compute capability 2.x. While the Programming Guide claims that the constant cache is still there and separate from the L1 cache, I’ve recently disassembled a few kernels and found that the compiler just places variables declared as constant in global memory and uses normal instructions for loading.

Yes that it what confuses me. I would like to know whether constant memory operates using broadcast for same address accesses or like global memory coalescing mode. Any references additional to Nvidia Programming Guide would be appreciated. I am on compute capability 2.0.

Thanks,

Evangelia

At the PTX level, the compiler is still generating fetch through cache instructions for constant memory loads on compute 2.x targets, so clearly the compiler still thinks constant memory works like it always did. The big question is whether the assembler for 2.x targets actually treats fetch through cache instructions any differently to regular loads or not. I suspect that disassembly of assembler output will be the only way to tell for sure. The documentation is silent on the issue AFAIK.