Wierd thing in Shared Memory Looking for an explanation


I am rather new to CUDA … so kindly excuse me if I am asking a rudimentary question.

I understand that any variable declared with the qualifier shared resides in the SMEM and is accessible by all threads within a block.
Consider a very simple kernal to transfer contents of an array b into an array c using SMEM as follows,

(Here lets say we have 10 threads in a block and lets say we have only one block)

global void(int *b,int *c)
shared int i[10];


Now if I transfer the variable c to the host and print it, it must have the contents of b in it, which is perfectly fine.


my program is looks something like this,

_global void(int *b,int *c)
shared int i;

Here instead of an array I am declaring a variable i in the SMEM. The thing is, I am getting the same result as above (i.e. contents of b perfectly copied into c), which I dont think should be happening since i resides in the SMEM and all threads can access i … and all threads will write into i simultaneously!! … Ideally I would expect some garbage values in c.

Is there a gap in my understanding of SMEM ? Kindly help…

Thanks in advance !!

Probably just compiler optimzation. In the second example the variable i can be removed and the code simplified because i isn’t used anywhere else in the code.