atomicMin with float

Has anyone managed to implement atomicMin using floating point numbers? I’ve made an attempt:

__device__ void Min(float *addr, float value)

{

float comp_value = *addr;

        if(comp_value <= value) return 0;

        while(comp_value > value)

        {

                unsigned int tmpA, tmpB;

                memcpy(&tmpB, &value, sizeof(unsigned int));

                memcpy(&tmpA, &comp_value, sizeof(unsigned int));

                unsigned int *tmp_ptr = (unsigned int *)addr;

                unsigned int tmp_result = atomicCAS(tmp_ptr, tmpA, tmpB);

                float result;

                memcpy(&result, &tmp_result, sizeof(unsigned int));

                if(result == comp_value) return;

                comp_value = result;

        }

}

but it doesn’t seem to be working correctly.

__device__ float fatomicMin(float *addr, float value)

{

        float old = *addr, assumed;

        if(old <= value) return old;

        do

        {

                assumed = old;

                old = atomicCAS((unsigned int*)addr, __float_as_int(assumed), __float_as_int(value));

        }while(old!=assumed)

        return old;

}

You can read programming guide 4.0 section B.11 a bit more.

This particular problem can be solved more simply by transforming the float to an integer with the right sorting properties and using the atomicMin for integers. See this thread:

http://forums.nvidia.com/index.php?showtopic=91491

Hi seibert

I just took a while to study the representation of float and int (i’m not a CS student). I guess when I’m sure the floats I’m trying to compare are positive, there is no need for any processing and the two floats can be compared directly as int. Am I right to say that?

Yes, I think that works out.

Thanks for showing us the nice trick :)

__device__ float fatomicMin(float *addr, float value)

{

        float old = *addr, assumed;

        if(old <= value) return old;

        do

        {

                assumed = old;

                old = atomicCAS((unsigned int*)addr, __float_as_int(assumed), __float_as_int(value));

        }while(old!=assumed)

        return old;

}

This may be incorrect, suppose two threads both get to the “do loop”, but the smaller one gets to atomicCAS first and the larger gets to atomicCAS, the result thus is not reliable.

change the critical line with

old = atomicCAS((unsigned int*)addr, __float_as_int(assumed), __float_as_int(fminf(value, assumed)));

may solve this.

Has anyone implemented atomicMin using half ( 16-bit floating point) now available in CUDA 7.5?

I just tried the solution + fix proposed by NeilYou (i.e., putting fminf inside atomicCAS) - This does not solve the problem of race condition since the processing of atomicCAS's arguments are NOT done in an “atomic” way. I witnessed this race condition and saw the errors.