How to initialize shared memory


If I declare shared float A[512] in the kernel, then how to set the A[512] to be zero?
just A[512] = 0?
A[threadIdx.x] = 0?But what if I don’t have as much thread as 512, if I just got 500, what about the left element initialization?

I found that I can get different result when runing release version if the initializatio of the shared memory didn’t set well.

run loop in each thread or better increase the amount of threads. Diverge unused threads by branching. Then test what works faster.


I do it like this:(using 500 threads to load 512 elements)

 __shared__ float     A[512];

int tid=threadIdx.x;



      for (int i=500;i<512;i++)

            A[i]=d_A[i];//assuming d_A[]  reside in global memory



If I were you, I would do it like below, I think your version leads to uncoalesced reads.

 __shared__ float     A[512];

int tid=threadIdx.x;


if (tid < 512-500)


DenisR,your are right!It seems better now!
Thanks. :D

By the way, remember that the last element’s index in a 512 array is 511 :)

And naturally the use of literal constants in the code is in example only. ;)

Furthermore, all the kernel calls will use the same fixed (500, 1) block size, right?

And (this is actually critical): sync_threads() after the initialization. Just to ensure
that all nodes on the array actually get initialized before any of them is used.