I have been pulling my hair out trying to figure out why I couldn’t get the right output from a program. After tons of tracing in the debugger, I finally decided to verify the output of the cuda API function I was using, atomicInc(). This function takes two args, an address to increment and a value to increment by. It returns the old value.
I was incrementing by fives.
I was getting values back each time incremented by one.