how to access shared memory in __device__ function

[font=“Arial”]hi,guys, i’ve got some questions while programming in cuda. i use some shared memory in global function, and i wonder that how i can pass these shared memory to device,which will be called within this __global__function. any help is greatly appreciated.[/font]

Hi,

In order to be able to share your shared memory across various device functions, you have to declare it as a global variable within your source file (ie declare it outside of any function). You can do it like this:

$ cat shared.cu

#include <stdio.h>

extern __shared__ int sh[];

__device__ void printVal(int i) {

    printf("[%d]: shared value is %d\n", threadIdx.x, sh[i]);

}

__global__ void kern() {

    sh[threadIdx.x] = threadIdx.x;

    __syncthreads();

    printVal((threadIdx.x+2)%blockDim.x);

}

int main() {

    kern<<<2,5,5>>>();

    cudaDeviceSynchronize();

    return 0;

}

$ nvcc -arch=sm_20 shared.cu -o shared

$ optirun ./shared

[0]: shared value is 2

[1]: shared value is 3

[2]: shared value is 4

[3]: shared value is 0

[4]: shared value is 1

[0]: shared value is 2

[1]: shared value is 3

[2]: shared value is 4

[3]: shared value is 0

[4]: shared value is 1

You can also pass a pointer to the shared memory variable to your device function. On devices of compute capability 1.x, this might cause the compiler to complain that it can’t figure out the address space (global, shared, constant, etc) of the pointer, but compute capability 2.0 and higher there will be no problem.