problems with AtomicMax!

Guys, i have this problem, i got two char arrays of 989881 size each. i populate them from 2 archives with different datas. then i call this cuda code:

__global__ void comparacao_paralela(char *img, char *clone, unsigned int *cmp, int N , int height, int width , int widthStep , int nChannels, int blockSize)
	unsigned int j, indice , index;
        index = blockIdx.x * blockDim.x + threadIdx.x;
	for(j=0 ; j < width ; j++)
		indice = CALCULAINDICE(widthStep, nChannels, index+1 , j);
		if (img[indice] != clone[indice])
		       atomicMax(&cmp[0], indice);	

where *img and *clone represents the 2 arrays that i mentioned before, however, *cmp [NOT DEFINED AS SHARED MEMORY] almost ALL the time returns with the value 990451 which makes no sense since the HIGHEST value that indice gets is 989881… AND if i change the code to atomicMax(&cmp[indice], indice); it works returning the value 922083.

Can SOMEONE please tell me what am i doing wrong here that atomicmax isnt returning the real biggest indice?

Thank you.

ps: i am using cuda 5.0 , geforce GTS450 1GB.

p2: i am putting the code and the files in the link below if anyone wants to check it out by compiling and executing!