Out Of Bound Reads From Shared Memory


I wonder if there is any guarantees that the kernel execution will not be terminated if some of the threads in the block will read past the end of shared memory of that block ?
I wouldn’t like to add instructions to prevent that since the result of those “bad” threads is going to be discarded anyway.


But isn’t that against the Ten Commandments of C programming !

“5 Thou shalt check the array bounds of all strings (indeed, all arrays)”

Also what if in future you add or rearrange the order of shared declarations, if you havn’t bounds checked then you may end up reading/writing data belonging to a different array

Yeah, and check all input pointers against NULL even if I’m sure they will never be NULL…

I never write, just read and discard the result.

Typical use case for me now is the reduce operation of the form:

for (int i = 1; i != dim; i *= 2)


  smem[threadIdx.x] = val;


  val += smem[threadIdx.x + i]; // <-- can read out of bounds.


if (threadIdx.x == 0) { *pOut = val; }

Here I can add a modulo division by block dim, or allocate twice more shared memory so that reads are ok. But why should I do that if hardware guarantees the kernel will continue execution.

On Fermi your kernel goes bust.

Oh dear…

Same rules apply to other memory types , for example, the constant memory ? From dx11 specs out of bounds reads from constant data return 0 always, is it the case with cuda ?