You can not write something like that! All run threads will try to write to that same location 1 which is not allowed. In such situation only one of them will succeed but you don’t know which. Because of that you have different results.
From your first post it was looked like you are asking about link between image coordinates and image stored in Cuda2DArray which would be much easier for you to handle but you are actually trying to access image as 1D array.
From that point it will be easier for understanding if you define onedimensional grid and block size. Imagine that like your image is covered with lines instead of rectangles. So each row could be one line (or more but keep this example simple until you get the picture) Number of rows will be dimension of your grid.
So if grid is defined like (512,1,1) in that case means you have 512 rows
and block defined like (512,1,1) means you will have 512 threads per block and it will represent single row.
Take a brake for the moment, you declared char* ImgIn. means during address calculation you need to include PixelSize in calculation. If your bitmap is RGB 24bits for example then
*ImgIn will be R component of first pixell
*ImgIn will be G component of first pixel
*ImgIn will be B component of first pixel
and so on
In this example your pixel is 3 byte size, if you use alpha then it will be 4 byte size, if your image channels are of type float then size of pixel RGB will be of 12 bytes RGBA will be 16 bytes
Pitch is actually width * PixelSize increased to larger proper value to satisfy alignment requirements. So instead using Width*PixelSize in your calculation you will use Pitch.
now reading pixel from position (x,y) will be:
YourPixelType* pixel = (char*) (ImgIn + yPitch + xsizeof(YourPixelType))
now because your BlockIdx.x is y coordinate of image and ThreadIdx.x is x coordinate
you can write:
int index = BlockIdx.xPitch + ThreadIdx.xsizeof(YourPixelType);
Supposing your output image is of same pixel size and same resolution like input image
ImgOut[index] = ImgIn[index];
will just copy source pixel to output image