I am doing some image processing work, and have a strange behavior (at least strange to me).
__shared__ unsigned int filterOut[BLOCK_SIZE]; // ... // outputColor is calculated from shared memory above // ... // TColor is just unsigned int as in CUDA SDK Image Denoising example TColor unsgndIntColor = make_color(outputColor.x,outputColor.y,outputColor.z,0); // No problem with the frame rate until here. // When I do the following assignment, the frame rate halves. filterOut[threadIdx.x + __mul24(BLOCKDIM_X,threadIdx.y)] = unsgndIntColor;
When I omit the assignment or assign it another random value like “threadIdx.x” code runs normally. So, only one assignment degrades the performance. I am sure that I am missing something, but cannot figure out. Can anyone make a guess out of it?