Access to global memory doesnt work without shared-buffering?

Hallo!

I am a newcomer in the CUDA Board, having a problem i can’t explain to myself, about shared memory;

I have to arrays in the global memory, h_filter and h_indata, which has been transfered to device with the MemCopy function;

I want to execute a convolution, which multiplicates both of them.

if i fill the h_filter array with data (for example 1) and i try to access it directly from the global memory in a loop, the memory access fails (debug shows entry “0” instead of “1”)

convolutionKernel( unsigned short* g_idata, Real* g_odata, int width, int height, unsigned short* filter)

for (int j = idth - (sizeDec); j <= (idth + sizeDec); j++){

		for (int i = idh - (sizeDec); i <= (idh + sizeDec); i++){

		  g_odata[idt] =  g_idata[i+j*width]*filter[k];

		  k++;

		}

	}

BUT when i copy the filterkernel from global memory into shared memory, before the convolution, the access to the memory is possible:

convolutionKernel( unsigned short* g_idata, Real* g_odata, int width, int height, unsigned short* filter)

__shared__ unsigned short sharedfilterl[7225];

for (short i=0;i<filterSize*filterSize;i++)

   sharedfilter[i]=filter[i];

for (int j = idth - (sizeDec); j <= (idth + sizeDec); j++){

		for (int i = idh - (sizeDec); i <= (idh + sizeDec); i++){

		  g_odata[idt] =  g_idata[i+j*width]*sharedfilter[k];

		  k++;

		}

	}

Does anyone have a explanation for this??

Thank you very much…

emm, may be the problem is that your code with shared memory and without are not the same. What is ‘k’ ?

And it is i bit strange that you copy data in cycle. You should try to copy data like this

unsigned int tid = blockDim.x * blockIdx.x + threadIdx.x;

shared[threadIdx.x] = global[tid];

__syncthreads();

Its just an excerpt from my whole code, this is guaranteed the only different part (because i can delete/insert the shared-commands and observe the described effect).