Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...?

I have been working on similar problems for the last few months.

I have programmed a few “hacks” that work for atomic computations at block level.

However the streaming architecture of the GPU is not designed for such constructs leading to possible dead locks!

I have already posted ways of doing this and is achieved by spin - loops + global writes!

http://forums.nvidia.com/index.php?showtopic=44144

Read from Memory

Work in parallel

Reduce in parallel (for threads within a single MP)

Reduce serially using the modified programming constructs.

Reduction + Block level synchronization + Memory optimization = very high performance gains

In short,Use the constructs as tools of getting around the problem not as a concrete reference!

I hope this helps

Cheers,

Neeraj