Atomic Operations Latency / Throughput

Hi Everyone

I have a few questions related to atomic operations on global memory:

  1. How many atomic units are present and how many operations are possible completed every cycle (throughput)?
  2. What is the latency of atomic operations on global memory?
  3. How are atomic units implemented in the current generation hardware?

Would someone point to descriptions or answers to these questions?

thank you

The blog on the web site is very close to answering your questions. Mr. Robertson does appear to have a clue what he’s writing about. ;)

Check posting #5 on atomics.