in the cudasdk 2.3, the bundled convolutionSeparable sample has a convolutionSeparable.cu file.
The convolutionRowsKernel function implemented in convolutionSeparable.cu has a piece of code like follows:
//Main data
#pragma unroll
for(int i = ROWS_HALO_STEPS; i < ROWS_HALO_STEPS + ROWS_RESULT_STEPS; i++)
s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] = d_Src[i * ROWS_BLOCKDIM_X]; //I am not sure about this line. ROWS_BLOCKDIM_X is defined as 16, so s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] will only get the value like d_Src[16*i] . That means s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] only get some values of d_Src, not the whole values of d_Src. As a result, only patial values of d_Src are assigned to s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X]. I wonder if there is a mistake. Because s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] will not get full values of d_Src.
anyone could explain the code for me ?
Thanks in advanced.