[SOLVED] Shared memory variable declaration

Hi,

I’m trying to declare two shared memory arrays inside a Cuda kernel. The size of arrays are dynamic and I’m using an extern variable with the size determined at runtime.

__global__ myKernel()
{
    extern __shared__ int localSum1[];
    extern __shared__ int localSum2[];
    ...
    int i_local = threadIdx.x + blockDim.x*threadIdx.y + blockDim.x*blockDim.y*threadIdx.z; 
    localSum1[i_local] = 0;
    localSum2[i_local] = 0;
    localSum1[i_local] += 1;
    localSum2[i_local] += 2;
    ...    
}

int main()
{
    ...
    myKernel<<<gridSize, blockSize, <b>localSize</b>>>>(); //local size is the size of arrays
    ...
}

If I compare the values from the two arrays they are the same and these ones should be different. I’m not sure if I’m using correctly the kernel configurations, I know that for one extern shared array declared inside the kernel it works fine, but now I have two arrays inside the kernel and I think that the space of memory is shared between the two arrays.

May be should I configurate the kernel with two localSize? Something like this:

int main()
{
    ...
    myKernel<<<gridSize, blockSize, localSize1, localSize2>>>();
    ...
}

Thank you.

This is covered in the programming guide:

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared[/url]

When using dynamically allocated shared memory, only one pointer to the allocated space will be given to the kernel code. If you want to divide up that space, you must do so yourself. Please read the above linked programming guide section for an example.

Sorry,

I read the point C.3.1.6.3 of the programming guide but I hadn’t read the point B.2.3

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#shared-memory-variable-declarations

Thank you.

Section C discusses concepts specific to CUDA Dynamic Parallelism. It does not present a general treatment of the topic.