Implementation of atomicMax for float

Hi I found this implementation for atomicMax to handle floats:

The example on the official CUDA doc for atomicAdd to handle doubles implements something similar.

__device__ __forceinline__ float atomicMax(float *address, float val)
    int ret = __float_as_int(*address);
    while(val > __int_as_float(ret))
        int old = ret;
        if((ret = atomicCAS((int *)address, old, __float_as_int(val))) == old)
    return __int_as_float(ret);

The code works, but I don’t understand why it implements with a “while” loop and “break” when “ret == old”. Why the following snippet doesn’t work? I think I must have missed some thoughts here. Can someone help explain? Thanks.

My original thought to replace the “while” loop but apparently this snippet doesn’t work as expected:

    if(val > __int_as_float(ret)){
        int old = ret;
        ret = atomicCAS((int*)address, old, __float_as_int(val));

First, you don’t seem to understand the semantics of atomicCAS. I’m not sure there is any point in linking to the relevant section in the programming guide because you already refer to

nevertheless you might want to read that section. At any rate, this is not sensible:

in C++ , true is reserved keyword, so it could only mean one thing, the boolean state. However the second argument to atomicCAS is not a boolean, but instead represents the proposed state that of the location that will be used for comparison. Since the location we are talking about here is either an int or a float, providing a boolean true for comparison makes no sense. In fact, you are supposed to provide a value that is expected to be in the location, for the atomicCAS operation to actually proceed with updating the location. Again, reading the documentation may help here.

If we instead provided some sensible value instead of true, your realization won’t work in a general, asynchronous multithreaded case. The detailed steps in your proposal are:

  1. The thread in question in your realization reads the value in the location.
  2. It then tests the value in the location, to see if it is less than the value that that thread would like to update it to.
  3. If the test in step 2 passes, the thread in question “forces” its own value into the location.

Now we must ask “what happens if another thread updates the location in between steps 2 and 3? (or 1 and 3)”.

Your realization is going to break. If your atomicCAS provides the expected value for that location (i.e. the previously read value) then it will fail (not do any updates) because the new value in the location doesn’t match the expected value.

Without a while loop or similar to catch this condition and “try again”, your atomic update may get “rejected” and “lost”.

Even if there were some way to “force” the value in (e.g. atomicExch would do that), we still can’t make it work in the general case. Suppose that the location originally contains 1. The thread in question would like to update it to 2. Some other thread would like to update it to 3. We have a hazard like this:

thread reads location as 1. thread compares 1 to 2, and decides to an atomicExch to force the location to 2. While the thread is doing this, another thread updates the location to 3. Then this thread “forces” a 2 into the location. The update to 3 is “lost”.

Takeaways (for this “generic” method):

  • atomicCAS is required so that you only update the value you think you are updating (i.e. the value at the point of update matches the value previously read)
  • a loop of some sort will be required, because the multithreaded access means that the value in the location may be updated at any time, and I need to handle this asynchronous update. If the location value doesn’t match what I expected to find there, I must start the update process over.
1 Like

Thank you very much! This is very clear. Now I understood the presence of “while” now, to guard against the race condition when the “*address” is updated again before doing atomicCAS(). Glad I asked it =)

Indeed I had some misunderstanding towards “atomicCAS”. Somehow earlier I read “compare” as a “predicate”, and thought both the return value and the address will be updated to “val”. Now I understood that the return value will be “*address” no matter whether “*address == compare”, and “compare” is a value rather than a “predicate”.

Just curious, the logic of this implementation is no different from the following single line to implement “float atomicMax”:

return atomicMax((int*)address, __float_as_int(val));

And my test produces the correct result using this single line. Not sure if I ignored any corner cases. Can we replace the previous implementation with this simpler implementation? Thanks.

generally speaking this depends on the ordering of int as compared to the ordering of the same bit patterns interpreted as float. I believe mostly the orderings are the same, but there may be some corner cases when considering behavior in the presence of float entities like -0 vs. 0 and perhaps NaN and Inf.

I don’t wish to revisit all that. It’s probably documented somewhere. This may be of interest.

I just found out if comparing negative floats, the single line solution doesn’t work. Indeed we need true float comparison as documented in earlier github repo.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.