Hi, I’m trying to read data into shared memory within each block, syncthreads, then work with the data. My problem is that my data is no longer stored in the shared memory after syncthreads is called, whereas it can be retrieved before the call. I have also created a histogram of blockIDs (flattened from x & y Idx) and I definitely have what I expect to have, which is 16 blocks with 64 threads in each block. None of the cuda calls are returning any errors.
I’ve been wasting a lot of time on this; any help would be greatly appreciated.
extern __shared__ int blocks [];
__global__ void myKernel(params...)
{
int id = ...
if (id >= maxthreads)
return;
TYPE* rowBlock = (TYPE*)blocks; // declare shared mem arrays
TYPE* colBlock = (TYPE*)&rowBlock[blockDim.y*blockDim.x]; // declare second array using offset
// do stuff here...
for(...
for(...
for(...
rowBlock[writeBlockid] = m1[readBlock1id];
colBlock[writeBlockid] = m2[readBlock2id];
// data still accessible here...
__syncthreads();
// rowBlock is now full of zeros?!
}
}
}
}