the use of syncthreads has got me confused.
I’m trying to do something really simple:
load an original array into a shared memory array
write the contents of the shared memory array into a global memory array
print out the contents of the global memory array. I should get global memory array = original array
everything works fine if i don’t use __syncthreads();
but if I use use __syncthreads(); after loading original array into global memory array then my output is wrong.
why is this?
resultIndex = w+h*dest->width; extern __shared__ int sArray; int* sKernel = (int*)&sArray; //put a small matrix into shared memory of each block if(threadIdx.y < kernel->width && threadIdx.x < kernel->width) sKernel[threadIdx.x+threadIdx.y*matrixWidth] = kernel->matrixGPUelement(threadIdx.x+threadIdx.y*matrixWidth); __syncthreads(); //this screws up my result output[resultIndex]=sKernel[threadIdx.x+threadIdx.y*matrixWidth];