If I declare shared float A[512] in the kernel, then how to set the A[512] to be zero?
just A[512] = 0?
A[threadIdx.x] = 0?But what if I don’t have as much thread as 512, if I just got 500, what about the left element initialization?
I found that I can get different result when runing release version if the initializatio of the shared memory didn’t set well.
I do it like this:(using 500 threads to load 512 elements)
__shared__ float A[512];
int tid=threadIdx.x;
if(tid==0)
{
for (int i=500;i<512;i++)
A[i]=d_A[i];//assuming d_A[] reside in global memory
}
A[tid]=d_A[tid];
And naturally the use of literal constants in the code is in example only. ;)
Furthermore, all the kernel calls will use the same fixed (500, 1) block size, right?
And (this is actually critical): sync_threads() after the initialization. Just to ensure
that all nodes on the array actually get initialized before any of them is used.