Hi,
I am rather new to CUDA … so kindly excuse me if I am asking a rudimentary question.
I understand that any variable declared with the qualifier shared resides in the SMEM and is accessible by all threads within a block.
Consider a very simple kernal to transfer contents of an array b into an array c using SMEM as follows,
(Here lets say we have 10 threads in a block and lets say we have only one block)
global void(int *b,int *c)
{
shared int i[10];
i[threadIdx.x]=b[threadIdx.x];
c[threadIdx.x]=i[threadIdx.x];
}
Now if I transfer the variable c to the host and print it, it must have the contents of b in it, which is perfectly fine.
Now…
my program is looks something like this,
_global void(int *b,int *c)
{
shared int i;
i=b[threadIdx.x];
c[threadIdx.x]=i;
}
Here instead of an array I am declaring a variable i in the SMEM. The thing is, I am getting the same result as above (i.e. contents of b perfectly copied into c), which I dont think should be happening since i resides in the SMEM and all threads can access i … and all threads will write into i simultaneously!! … Ideally I would expect some garbage values in c.
Is there a gap in my understanding of SMEM ? Kindly help…
Thanks in advance !!