Cannot get correct index Image Processing on CUDA

Hello ,

I have started to develop using CUDA. I was trying to have a very simple image processing that only inverts the colors, using only the global memory.

But my picture result always comes out currupted, and it only iterate throught the Block dimension in X axis.

my CUDA code is:

// values is a 1D array of image pixels, values differs between 0-255

// wid and hei are width and height of my image.

extern "C" __global__ void testget(int *values, int wid,int hei)


	const unsigned int i = blockIdx.x * blockDim.x  + threadIdx.x;

	const unsigned int j = blockIdx.y * blockDim.y  + threadIdx.y;


	//if (i<wid && j<hei)

	values[j+i*wid ]= 255-values[j+i*wid];


my BLOCKSIZE = 15; My dim3 array of block is (15,15,1)

my grid size is (wid/BLOCKSIZE , hei/BLOCKSIZE)

I allocate the int array using cudamalloc . So it is like a simple array.

Note that i don’t have to worry about my image stride, since I already converted the image pixels to an int array without any trouble with strides in the picture

And after that i pass the picture to CUDA kernel.

Please if you could help me or point me in the right direction, thank you.

Best Regards,

Aram Azhari

That kernel looks fine. My guess is that there is something wrong in your host side code.

You mean to write:

values[i+j*wid ]= 255-values[i+j*wid];

Note the transposed i and j.

Well in either case, It seems that I’m stock in my block size height, instead of my image height.

Unfortunately i’m using , could there be any bug in that?