Accessing shared memory from other function

Hello, I am trying to use shared memory. My goal is to write shared memory in one function by one thread, then read it in another function by another thread.

__device__ void bar() {
	__syncthreads();
	extern __shared__ int shmem[];
	if(threadIdx.x != 1) {
		//Can I read the shmem here?
		int val1= shmem[100];
		int val2= shmem[101];
	}
}
__device__ void foo() {
	//dynamic shmem
	extern __shared__ int shmem[];
	shmem[100] = 100;
	shmem[101] = 200;
	__threadFence();
	bar();
}
__global__ void kernel() {
	if(threadIdx.x == 0)
		foo();
	else
		bar();
}

One fairly simple approach is to manage the shared memory in your global function, then pass a pointer to it to your device functions.

Thanks for the quick answer, I know that’s possible. I am wondering whether is my way possible or not.

In your specific case, is it feasible to declare shared memory in the global kernel and access it via passing points?

BTW, threads are executed in warps of 32, so doing read/write operations by only one thread is generally not a GPU friendly style.

Have you tried your approach?

What was the result?

I can say just by looking at it that you don’t seem to understand the concept of synchronization when doing loads and stores from shared memory. If I were to say anything like “yes that should work” then someone eventually will come back with some kind of code derived from it, saying, “no it doesn’t”.

So:

  • no your code will not compile
  • if you fix the compilation errors, your code will compile, but it has illegal constructs (illegal behavior)
  • when you get your code to compile, you could theoretically access shared memory the way you indicate, ignoring the illegal behavior
  • even though you can access shared memory the way you indicate, if you were attempting/expecting to perform inter-thread communication that way, it will not work, for several reasons.

The following code has various issues fixed and should be safer than what you have shown:

$ cat t7.cu
#include <stdio.h>
__device__ void bar() {
    extern __shared__ int shmem[];
    if(threadIdx.x != 1) {
        //Can I read the shmem here?
        int val1= shmem[100];
        int val2= shmem[101];
        printf("val1 = %d, val2 = %d\n", val1, val2);
    }
}
__device__ void foo() {
    //dynamic shmem
    extern __shared__ int shmem[];
    shmem[100] = 100;
    shmem[101] = 200;
    __threadfence();
    bar();
}
__global__ void kernel() {
    if(threadIdx.x == 0)
        foo();
    __syncthreads();
    if(threadIdx.x != 0)
        bar();
}

int main(){

  kernel<<<1,2,1024*sizeof(int)>>>();
  cudaDeviceSynchronize();
}
$ nvcc -o t7 t7.cu
$ cuda-memcheck ./t7
========= CUDA-MEMCHECK
val1 = 100, val2 = 200
========= ERROR SUMMARY: 0 errors
$