Coalescing of cached constant memory

EvangeliaS · April 25, 2011, 1:01am

Hi,

coalescing applies to constant memory? To my understanding it does. My main question is whether it applies to cached memory too.

Thanks,
Evangelia

tera · April 25, 2011, 1:37am

No. If different threads read different locations in constant memory, accesses get serialized.

EvangeliaS · April 25, 2011, 1:47am

I see, I observed much better performance when memory is coalesced for constant memory and I thought since constant memory is part of global memory (being cached), this was justified.

Thanks,

Eva

tera · April 25, 2011, 1:57am

You might actually be right on compute capability 2.x. While the Programming Guide claims that the constant cache is still there and separate from the L1 cache, I’ve recently disassembled a few kernels and found that the compiler just places variables declared as constant in global memory and uses normal instructions for loading.

EvangeliaS · April 25, 2011, 3:50am

Yes that it what confuses me. I would like to know whether constant memory operates using broadcast for same address accesses or like global memory coalescing mode. Any references additional to Nvidia Programming Guide would be appreciated. I am on compute capability 2.0.

Thanks,

Evangelia

avidday · April 25, 2011, 4:15am

At the PTX level, the compiler is still generating fetch through cache instructions for constant memory loads on compute 2.x targets, so clearly the compiler still thinks constant memory works like it always did. The big question is whether the assembler for 2.x targets actually treats fetch through cache instructions any differently to regular loads or not. I suspect that disassembly of assembler output will be the only way to tell for sure. The documentation is silent on the issue AFAIK.

Topic		Replies	Views
when should I use constant cache for speeding up CUDA Programming and Performance	13	18587	October 29, 2010
Global memory broadcasting? CUDA Programming and Performance	4	5748	October 2, 2008
constant cache CUDA Programming and Performance	3	2247	April 24, 2014
Constant memory access Using banks like the shared memory? CUDA Programming and Performance	4	4498	January 6, 2009
constant cache no faster than global mem? constant memory access CUDA Programming and Performance	5	5724	March 6, 2008
Warp Serialisation and Constant Memory Performance Surprise CUDA Programming and Performance	7	3924	March 3, 2009
Slow local memory, feigned constant memory. coalesced? global? CUDA Programming and Performance	29	7325	January 25, 2010
Small const array accessable globally? Is it easy and possible? CUDA Programming and Performance	6	1452	April 16, 2009
How to save a big data(4M, larger than constant memory) wihch is frequently used by every thread lik CUDA Programming and Performance	4	793	October 26, 2013
__constant__ on Fermi being read through global mem CUDA Programming and Performance	4	2675	March 21, 2011

Coalescing of cached constant memory

Related topics