Shared Memory Problem memory shared only within blocks?

asteyc · February 7, 2008, 7:30pm

From what I understand from the cuda programmers guide, each block has it’s own unique bank of 16K of shared memory. Threads in block 1 do not have access to the shared memory in block 2, etc. I wrote some code that uses this concept extensively, yet when I run it, it seems as if all the threads communicate with the same shared memory, regardless of what block they reside in.

Let me see if i can construct a stripped down example of what I’m trying to say…

// somewhere else, in the calling code

dim3 grid(16,1,1); 

dim3 block(8,1,1); 

f <<< grid, block, some_extra_memory>>>( d_d, d_o);

//...

__global__ void f( some_global_data* d, some_output_data *o) {

 __shared__ float4 localdata[8];

  __shared__ float4 calculated_data[8]

  localdata[threadIdx.x] = d->data[ some_calculated_offset ];

  __syncthreads();

 //perform some calculation on that shared little bit of memory

  //place it back in shared memory

  __syncthreads();

 o->data[ another_calculated_offset ] = calculated_data[threadIdx.x];

}

When I run this in emulation mode, the data that I get along the way seems to compound, keeping the previous data (leading me to believe threads are communicating across the shared memory boundaries) Where has my thinking gone wrong?

Thanks!

DenisR · February 7, 2008, 7:44pm

emulation mode is different from running on the device. So that may explain the issues you are seeing. I believe I read sometimes that people have trouble because on device things don’t work. Usually they were in emulation mode communicating over blocks.

asteyc · February 7, 2008, 7:46pm

I understand that could be an issue, what I’m wondering is whether or not I’m thinking of shared memory incorrectly

DenisR · February 7, 2008, 7:54pm

No you are completely spot on, shared mem is only visible to the threads that belong to 1 block.

MisterAnderson42 · February 8, 2008, 3:11pm

You should inintialize shared memory at the beginning of the kernel.

In emulation mode, it seems from your test that each block is run sequentially on the same shared memory area. On the device this won’t be the case, but after one block completes, another will be launched possibly using the same shared memory of the one that just finished.

Topic		Replies	Views
Shared memory : shared access CUDA Programming and Performance	4	2021	July 21, 2008
Shared memory issues Initialization of shared memory CUDA Programming and Performance	2	6717	August 23, 2007
shared memory and syncthreads question CUDA Programming and Performance	2	1211	March 3, 2009
Scope of shared memory in CUDA CUDA Programming and Performance	12	3861	November 27, 2015
Shared Memory modelling in Device Emulation Mode! Understanding DeviceEmu realities CUDA Programming and Performance	8	6224	December 6, 2007
Use shared Memory CUDA Programming and Performance	3	430	December 26, 2019
Shared memory per block Related to shared memory of an MCPU CUDA Programming and Performance	3	3983	August 14, 2007
Threads Stuck in devce emulation CUDA Programming and Performance	7	2701	May 22, 2009
Cuda shared memory within thread blocks problem CUDA Programming and Performance	2	6622	February 24, 2011
Shared memory problem CUDA Programming and Performance	4	1551	April 1, 2009

Shared Memory Problem memory shared only within blocks?

Related topics