Atomic Write Issues Works in Emulation, Not on GPU

Hey Guys, I have the following code running as my Kernel.

__global__ void ListRankKernel(int *VAL, int *SUC, int size)

{

       int i=(blockIdx.x*512)+threadIdx.x;

        if(i<size)

        {

         while((SUC[i]!=-1)&&(SUC[SUC[i]]!=-1))

        {

              atomicAdd(&VAL[i],VAL[SUC[i]]);

              atomicExch(&SUC[i],SUC[SUC[i]]);

        }

        }

	

}

This code runs perfectly when I run it in the emulator, but on the GPU it fails for size > ~100, i assume that the atomic writes are not really happening like they should on the GPU here - also there might be a problem with the use of the SUC variable.

I think that the two atomic functions must execute in a kind of critical section to ensure that the algorithm works correctly. Please let me know of a workaround for this - I need both variables to be 32 bits too.

Sorry guys, my bad… the algorithm required both the writes to happen in a single atomic operation, which is not possible in CUDA 1.1. So i packed the arrays as shorts and did a single atomic write operation, and it worked! My data range got slashed to 32K, will need to test out the new 64-bit atomics in the new generation to work with the standard data set now.