About device shared variable

hello_olleh · February 25, 2008, 11:30am

I am just a beginner to the world of CUDA.

I read that the scope of device shared variable is only block. But in my code I am observing that all the blocks can access it. The program output is 90, but I am confused why it is not 45? The 10 threads of one block will add (0+1+…+9)=45. But 20 threads from 2 different blocks can access this shared variable. Would anyone please expalin?

#include <stdio.h>

#define block 2

#define thread 10

__device__ __shared__      int sum;

__global__ static void hello(int *N)

{

int tx=threadIdx.x;

sum+=N[tx];

	

}

int main()

	{                                                                                                                      

  int data[10]={0,1,2,3,4,5,6,7,8,9};

        int *num;

 cudaMalloc((void**)&num,sizeof(int)*10);

  cudaMemcpy(num,data,sizeof(int)*10,cudaMemcpyHostToDevice);

 

	hello <<<block ,thread>>>(num);

	printf("%d",sum);

     return 0;

	}

AndreiB · February 25, 2008, 11:38am

Because you’re running in device emulation mode and contents of shared memory is not discarded between block invocations.

If you run your code on GPU you will get even more strange results. You shpuld understand that variable declared with shared keyword is really shared between all threads of a block and all these threads are executed simultaneously. So,

sum+=N[tx];

will produce undefined results since all threads will write to same shared memory location at the same time.

hello_olleh · February 27, 2008, 2:55am

Yes, I got it. Thanks a lot .

Topic		Replies	Views
shared memory CUDA Programming and Performance	2	2148	January 30, 2009
Quick Question about __shared__ variables CUDA Programming and Performance	1	2357	February 18, 2009
How shared are shared variables? Can shared variables from separate function calls conflict? CUDA Programming and Performance	3	2502	July 17, 2011
Shared memory access of many threads CUDA Programming and Performance	2	2817	December 4, 2008
Shared Mem (w/ & w/out extern) CUDA Programming and Performance	2	2294	October 2, 2009
Shared Memory Problem memory shared only within blocks? CUDA Programming and Performance	4	5945	February 8, 2008
Wierd thing in Shared Memory Looking for an explanation CUDA Programming and Performance	1	3495	January 28, 2011
Shared Memory modelling in Device Emulation Mode! Understanding DeviceEmu realities CUDA Programming and Performance	8	6224	December 6, 2007
Question about shared variables CUDA Programming and Performance	2	5560	December 12, 2010
Thread block clusters and distributed shared memory not working as intended CUDA Programming and Performance	8	1282	November 8, 2023

About __device__ __shared__ variable

Related topics

About device shared variable