convolutionSeparable sample question need help

cudars · November 30, 2009, 9:11am

in the cudasdk 2.3, the bundled convolutionSeparable sample has a convolutionSeparable.cu file.
The convolutionRowsKernel function implemented in convolutionSeparable.cu has a piece of code like follows:

//Main data
#pragma unroll
for(int i = ROWS_HALO_STEPS; i < ROWS_HALO_STEPS + ROWS_RESULT_STEPS; i++)
    s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] = d_Src[i * ROWS_BLOCKDIM_X];      //I am not sure about this line. ROWS_BLOCKDIM_X is defined as 16, so s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] will only get the value like d_Src[16*i] . That means s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] only get some values of d_Src, not the whole values of d_Src. As a result, only patial values of d_Src are assigned to s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X]. I wonder if there is a mistake. Because s_Data[threadIdx.y][threadIdx.x + i * ROWS_BLOCKDIM_X] will not get full values of d_Src.

anyone could explain the code for me ?

Thanks in advanced.

grzes_k · March 4, 2010, 8:28pm

Still need help with it?

santevid · April 15, 2010, 8:59pm

also need help?
Does anybody know ?

EDIT:
Sorry, i figure out right know.

This is the key:

d_Src += baseY * pitch + baseX;
d_Dst += baseY * pitch + baseX;

So, convolution (in SDK) works properly.

santevid · April 15, 2010, 8:59pm