How to use atomicCAS() to implement atomicAdd(short)? Trouble adapting programming guide example

Further diagnostic info:

Debugging with Nsight 2.0 reveals that as soon as I try to use atomicAddShort on a shared mem value, there is a memory access violation. No such problem with global mem (as you can see, the d_data has been updated).

screenshot attached.

Oh, I’ve been wrong - of course the kernel tests atomicity as each thread does the same atomicAdd(). My bad.

Are you compiling for sm_13? This really is the only explanation I can offer.

And I find it quite suspicious that sm_11_atomic_functions.h gets included in your code…

when the memory violation occurs it breaks into that file… i have no idea

ill attach my code so perhaps u could try

as far as i can tell, yes its sm_13

i have normal atomocAdd working on regular ints in shared mem

Just a note that I have successfully solved this problem, with many thanks to tera and Sylvain Collange.

The complete working implementation has been added to the original post in this thread.

Cheers,
Mike

Has anyone implemented atomicMin using half ( 16-bit floating point) now available in CUDA 7.5?

Hi Ganellari,

See code below for half-float atomicMin.

Note that it is implemented on 32-bit atomicCas therefore it might not achieve the best performance possible (compare and swap is still done in 32-bit granularity even if only 16 bits are swapped).

If somebody can improve on this it would be great, atomic operations on half-float are extremely useful.

inline  __device__ half atomicMin(half* address, half val) {

    unsigned int *base_address = (unsigned int *) ((char *)address - ((size_t)address & 2));
    unsigned int long_old = *base_address, assumed;

    unsigned short new_half;

    if(val >= *address )
       return *address;

    do {
        assumed = long_old;
        new_half = __hlt(val, *address) ? val : *address;
        unsigned int long_val = ((size_t)address & 2) ? ((unsigned int) new_half << 16 | assumed & 0xffff) :
                              (unsigned int)new_half | assumed & 0xffff0000 ;

        long_old = atomicCAS(base_address, assumed,long_val);

    } while (assumed != long_old);


    unsigned short old = ((size_t)address & 2) ? (unsigned short) (long_old  >> 16) : (unsigned  short) (long_old & 0xffff);


    return __ushort_as_half(old);
}