Guys, i have this problem, i got two char arrays of 989881 size each. i populate them from 2 archives with different datas. then i call this cuda code:
__global__ void comparacao_paralela(char *img, char *clone, unsigned int *cmp, int N , int height, int width , int widthStep , int nChannels, int blockSize)
{
unsigned int j, indice , index;
index = blockIdx.x * blockDim.x + threadIdx.x;
for(j=0 ; j < width ; j++)
{
indice = CALCULAINDICE(widthStep, nChannels, index+1 , j);
if (img[indice] != clone[indice])
atomicMax(&cmp[0], indice);
}
}
where *img and *clone represents the 2 arrays that i mentioned before, however, *cmp [NOT DEFINED AS SHARED MEMORY] almost ALL the time returns with the value 990451 which makes no sense since the HIGHEST value that indice gets is 989881… AND if i change the code to atomicMax(&cmp[indice], indice); it works returning the value 922083.
Can SOMEONE please tell me what am i doing wrong here that atomicmax isnt returning the real biggest indice?
Thank you.
ps: i am using cuda 5.0 , geforce GTS450 1GB.
p2: i am putting the code and the files in the link below if anyone wants to check it out by compiling and executing!
http://www.mediafire.com/?2zfd9wf5a3iia64