atomicCAS with unsigned short on CUDA 10.1

Hello !

I’m running into some trouble compiling atomicCAS using unsigned short int’s.

The online documentation ( clearly shows that the 16 bit type should exist (which surprised me because it would be the only atomic method working with 16 bit types) but it doesn’t compile (No instance of overloaded function “atomicCAS” matches the argument list). The 32-bit and 64-bit version compiles problem-free.

I also noticed that the addition of the 16-bit variant is fairly new (doesn’t appear in the CUDA 8 manual) so I was just wondering if this is maybe a mistake or only available on later compute capabilities. There’s however nothing about it the compatibility table.

I’m compiling with CUDA 10.1 using a P5000 (compute 6.1) with VS2015

Any help is appreciated :)

Thanks !

PS: The method is also not listed in device_atomic_functions.hpp.
I should actually also mentionned that the kernel is called using dynamic parallelism

The docs for atomicAdd indicate:

“The 16-bit __half floating-point version of atomicAdd() is only supported by devices of compute capability 7.x and higher.”

I suspect a similar message should have been included with the atomicCAS function docs, and that this may be a doc oversight.

As a test, can you try compiling specifically and only for architecture sm7.0, and see if the error goes away?

I realize code compiled that way will not run on your GPU. It is for a test only. If the error goes away, then I think it is safe to say that 16 bit atomics are only implemented on cc 7.0 and higher, and you’ll need to refactor your code to run on cc 6.1

Thank you !

Indeed, the method with half ints compiles under compute 7.5 and not 6.1. I should have read the docs more carefully with respect to atomicAdd.

The atomic version for half ints can actually be found on this forum.

Thanks again !