Hi,
I’m trying to declare two shared memory arrays inside a Cuda kernel. The size of arrays are dynamic and I’m using an extern variable with the size determined at runtime.
__global__ myKernel()
{
extern __shared__ int localSum1[];
extern __shared__ int localSum2[];
...
int i_local = threadIdx.x + blockDim.x*threadIdx.y + blockDim.x*blockDim.y*threadIdx.z;
localSum1[i_local] = 0;
localSum2[i_local] = 0;
localSum1[i_local] += 1;
localSum2[i_local] += 2;
...
}
int main()
{
...
myKernel<<<gridSize, blockSize, <b>localSize</b>>>>(); //local size is the size of arrays
...
}
If I compare the values from the two arrays they are the same and these ones should be different. I’m not sure if I’m using correctly the kernel configurations, I know that for one extern shared array declared inside the kernel it works fine, but now I have two arrays inside the kernel and I think that the space of memory is shared between the two arrays.
May be should I configurate the kernel with two localSize? Something like this:
int main()
{
...
myKernel<<<gridSize, blockSize, localSize1, localSize2>>>();
...
}
Thank you.