2D mapping for image processing

Hello,

I have an image of 1920 * 1080 and I’m doing some processing on it.

I’m managing the memory as a flat array so my first approach was to think about the image as a vector, do the mapping of blocks/threads as 1D-array and I had a kernel like this:

__global__ void processing( int *a, int *b)

{

   int tid = threadIdx.x + blockIdx.x * blockDim.x;

   while (tid < N) {

      b[tid] = a[tid] -100;

      tid += blockDim.x * gridDim.x;

   }

}

And it worked really good (actually the processing is more complex than substract 100 but anyway this is just an example).

Now I am trying to think of the image as a 2D-array (but keep the result in a flat array) and so I was thinking of using the mapping of the blocks/threads as 2D and the resulting kernel is something like this:

__global__ void processing(int *a, int *b)

{

	int idxI = blockIdx.x * blockDim.x + threadIdx.x;

	int idxJ = blockIdx.y * blockDim.y + threadIdx.y;

	

	while(idxJ < 1080)

	{

		while(idxI < 1920)

		{

			dilation[idxI*gridDim.x+idxJ] = imageBW[idxI*gridDim.x+idxJ]-100;

		}

	}

}

but well this code isn’t doing nothing on the image, the final image has nothing to do with the expected result, I’m pretty sure that my problem is with the indexes but cannot understand, any suggestions?!

Does

dilation[idxI*gridDim.x*blockDim.x+idxJ] = imageBW[idxI*gridDim.x*blockDim.x+idxJ]-100;

work any better?

Well I tried it but these errors appeared…

––CUDA error: 

	cudaMemcpy( dilation->data, d_dilation, sizeof(u_char)*sizeImg, cudaMemcpyDeviceToHost ) returned "the launch timed out and was terminated"

––CUDA error: 

	cudaEventRecord( stop, 0 ) returned "the launch timed out and was terminated"

––CUDA error: 

	cudaEventSynchronize( stop ) returned "the launch timed out and was terminated"

––CUDA error: 

	cudaEventElapsedTime( &elapsedTime, start, stop ) returned "the launch timed out and was terminated"

––CUDA error: 

	cudaEventDestroy( start ) returned "the launch timed out and was terminated"

––CUDA error: 

	cudaEventDestroy( stop ) returned "the launch timed out and was terminated"

I resolve this, it was an infinite loop…but now what happens is that the values are not decreased by 100 instead, the image appears to move little blocks of the image around so if the original is:

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

the resulting is something weird like:

2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2

2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2

2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2

2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2

1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1

1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1

1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1

…any suggestions to this?