After just a quick look I have an observeation - if every pixel of the image You are processing is 24 bits wide and You store it in a unsigned char array, with the code You posted You will change only the one third of the pixels. And this is because You donnot include ‘distance’ between pixels in Your computations. In a 24 BPP image each pixel starts with every otherthird byte of the image data array:
[R'BYTE][G'BYTE][B'BYTE][R'BYTE][G'BYTE][B'BYTE][R'BYTE]... ^ ^ ^ i-th pixel (i+1)-th pixel (i+2)-th pixel (and so on)
It of course can depend on the actual image format, but I assumed that ‘img’ points to raw data.
Your problem is similar to this one:
Your program assumes you will assign one thread per pixel. But each thread is only changing one byte out of your pixel. Your allocation and copy operations may also need to be adjusted for the number of bytes per pixel.