Performance of fire-and-forget atomics vs non-atomic writes

Olumide · August 8, 2017, 12:11am

According to the text The CUDA Handbook: A Comprehensive Guide to GPU Programming by Nicholas Wilt

At the hardware level, atomics come in two forms: atomic operations that return the value that was at the specified memory location before the operator was performed, and reduction operations that the developer can “fire and forget” at the memory location, ignoring the return value. Since the hardware can perform the operation more efficiently if there is no need to return the old value.

Others claim that

Atomics have “fire-and-forget” semantics. This means that the kernel calls the atomic operation and lets the actual atomic operation be executed by the cache (not on the the SM), and the kernel will move on the the next instruction without waiting for the actual atomic operation to complete. This only works if there is no return value from the atomic operation, which is the case in this example. The fire-and-forget semantics let the SM get on with it’s computations, offloading the computation of the atomic to the cache.

All of this would seem to suggest that there is no apparent delays when using fire-and-forget atomic writes, and that they are possibly as fast as non-atomic writes. I suspect that this is not/cannot be be true. Is there any information available on the write performance of both types of write operations.

Topic		Replies	Views
Why are Atomics discouraged? CUDA Programming and Performance	8	4643	April 29, 2013
Performance of Atomic operations CUDA Programming and Performance	2	2733	December 17, 2008
Fermi atomic op 10 times slower than ATI GPU? CUDA Programming and Performance	4	10024	July 25, 2011
Taking apart global atomics performance performance, graphs, theories CUDA Programming and Performance	23	7900	January 28, 2012
atomic performance under fermi need a solution to scattered write problem CUDA Programming and Performance	3	3598	September 3, 2011
Own atomic functions CUDA Programming and Performance	4	3098	August 4, 2008
Atomic Operations CUDA Programming and Performance	4	4682	November 11, 2015
atomic read or write CUDA Programming and Performance	3	4223	July 15, 2009
Atomic Functions Performance CUDA Programming and Performance	6	3765	August 22, 2008
Variable Number of Results CUDA Programming and Performance	3	1726	April 10, 2009

Performance of fire-and-forget atomics vs non-atomic writes

Related topics