Perhaps you could use ordinary atomic adds and use an auxiliary bit to indicate overflow. If an add causes the result to overflow, i.e., sum is less than original value when doing an unsigned comparison, the thread can set the overflow bit in another memory location. The downside is that when you want to use the value, you need to check both the overflow bit and the main variable. The upside is that the atomic addition runs at full speed except for the (presumably rare) case where an overflow occurs and a second memory access is needed to set the overflow bit.