Hallo!
I am a newcomer in the CUDA Board, having a problem i can’t explain to myself, about shared memory;
I have to arrays in the global memory, h_filter and h_indata, which has been transfered to device with the MemCopy function;
I want to execute a convolution, which multiplicates both of them.
if i fill the h_filter array with data (for example 1) and i try to access it directly from the global memory in a loop, the memory access fails (debug shows entry “0” instead of “1”)
convolutionKernel( unsigned short* g_idata, Real* g_odata, int width, int height, unsigned short* filter)
for (int j = idth - (sizeDec); j <= (idth + sizeDec); j++){
for (int i = idh - (sizeDec); i <= (idh + sizeDec); i++){
g_odata[idt] = g_idata[i+j*width]*filter[k];
k++;
}
}
BUT when i copy the filterkernel from global memory into shared memory, before the convolution, the access to the memory is possible:
convolutionKernel( unsigned short* g_idata, Real* g_odata, int width, int height, unsigned short* filter)
__shared__ unsigned short sharedfilterl[7225];
for (short i=0;i<filterSize*filterSize;i++)
sharedfilter[i]=filter[i];
for (int j = idth - (sizeDec); j <= (idth + sizeDec); j++){
for (int i = idh - (sizeDec); i <= (idh + sizeDec); i++){
g_odata[idt] = g_idata[i+j*width]*sharedfilter[k];
k++;
}
}
Does anyone have a explanation for this??
Thank you very much…