dynamic shared mem and syncthreads problem shared memory no longer set after syncthreads?

Hi, I’m trying to read data into shared memory within each block, syncthreads, then work with the data. My problem is that my data is no longer stored in the shared memory after syncthreads is called, whereas it can be retrieved before the call. I have also created a histogram of blockIDs (flattened from x & y Idx) and I definitely have what I expect to have, which is 16 blocks with 64 threads in each block. None of the cuda calls are returning any errors.

I’ve been wasting a lot of time on this; any help would be greatly appreciated.

extern __shared__ int blocks [];

__global__ void myKernel(params...)

{

	int id = ...

	if (id >= maxthreads)

		return;

	TYPE* rowBlock = (TYPE*)blocks;  // declare shared mem arrays

	TYPE* colBlock = (TYPE*)&rowBlock[blockDim.y*blockDim.x]; // declare second array using offset

	// do stuff here...

for(...

   for(...

      for(...

	rowBlock[writeBlockid] = m1[readBlock1id];

	colBlock[writeBlockid] = m2[readBlock2id];

	// data still accessible here...

	__syncthreads();

	// rowBlock is now full of zeros?!

      }

   }

}

}

must “extern shared” not be used inside your kernel function ?

No, see B.2.3 on Dynamically Allocating Shared Memory in the CUDA Programming Manual. Still no progress toward a solution.

The __syncthreads() function waits for all the threads in a block to reach it. Yet you have a return statement earlier in your kernel. Thus, if any of your threads return, behavior of __syncthreads is undefined, which might explain your odd results.