I have problems with atomicAdd function (unsigned integer version).
I have a buffer in global memory, that is actually big 2d array of counters. It’s quite big 1024x1024.
Each running thread does some work and at the very end it increments one of the counters in that array. The index that is incremented by each thread is depended by the input data, and actually it’s chaotic and doesn’t relate to the thread index. There can be situation that few threads will increment the same counter, but mostly - each thread will increment it’s own.
Everything works perfect on the GF 400, but I get problem with GF9000 and GF200 (didnt tried on GF8000 cause we dont support it). The problem is - kernel launch fails with “unspecified launch failure” message. If I comment this last line with atomicAdd - it works. Kernel doesnt contain any loops, and it’s quite fast.
Can there be any problems related to the fact that each thread increments a random address of memory at the atomicAdd?