newbie question shared mem


I have a simple question:

If all the threads within a block need to read the same position (k) of an array (a), should I use a shared variable (w) like this:

shared float w;
if (blockIdx.x == 0) w = a[k];

or, since I’m not writing into the array, can I do this without causing a bottleneck:

shared float w = a[k];

I would go with the first one, but I need to know for sure :)


I would use constant memory, if the value is not changed. You copy it into the GPU global memory using MemCpyToSymbol (sp.), then any references are cached on the chip by CUDA. constant memory is cached optimally for many threads concurrently accessing the same elements, and is easy to use.