Fermi atomic op 10 times slower than ATI GPU?

benetion · July 25, 2011, 3:34am

The test Beyond3D showed on this page Beyond3D - NVIDIA Fermi GPU and Architecture Analysis
suggests that Fermi’s GTX470 atomic is 12 times slower than ATI’s HD5870, for instance on shared memory, which is very surprising to me!

Does Anyone have any insight or comment on this? Thanks!

Simon_Green · July 25, 2011, 10:38am

I wouldn’t put too much stock in synthetic tests like these. In real applications Fermi performs pretty well using atomic operations in my experience. I’m not sure why they’re testing shared memory atomic with no contention, why even use atomics in this case?

Sarnath · July 25, 2011, 3:08pm

but… What about global atomics - the increment ones…?? Any comment on that?
(I hope that they are not doing something that the compiler is optimizing away…)

benetion · July 25, 2011, 7:26pm

It would be interesting to see if atomicadd (shared mem.) without memory contention is as fast as a normal add…

cbuchner1 · July 25, 2011, 9:41pm

here is a write-up by someone who thoroughly analyzed atomics performance.

http://strobe.cc/cuda_atomics/cuda_atomics.pdf