In my PyTorch CUDA extension, I require an atomicAdd. But I get the following error:
error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (c10::Half *, c10::Half)
The kernel is dispatched using PyTorch’s AT_DISPATCH_FLOATING_TYPES_AND_HALF macro; it only compiles when I change it to AT_DISPATCH_FLOATING_TYPES, indicating that it works with float and double, but not the half datatype.
int atomicAdd(int* address, int val);
unsigned int atomicAdd(unsigned int* address,
unsigned int val);
unsigned long long int atomicAdd(unsigned long long int* address,
unsigned long long int val);
float atomicAdd(float* address, float val);
double atomicAdd(double* address, double val);
__half2 atomicAdd(__half2 *address, __half2 val);
__half atomicAdd(__half *address, __half val);
__nv_bfloat162 atomicAdd(__nv_bfloat162 *address, __nv_bfloat162 val);
__nv_bfloat16 atomicAdd(__nv_bfloat16 *address, __nv_bfloat16 val);
If c10::Half is compatible to one of those types, you can just use typecasts. Otherwise, you need to implement your own compare-and-swap-based atomicAdd.
Thank you for the answer! That link was helpful. It says:
The 16-bit __half floating-point version of atomicAdd() is only supported by devices of compute capability 7.x and higher.
I had already tried overloading the function and then casting to __half in the case where scalar_t == c10::Half. But it turns out that my GeForce GTX 1070 does not support __half atomicAdd(__half *address, __half val) as it has compute capability 6.1 as shown here.
Do you know of a way to work around the issue? How do I implement my own atomicAdd for__half values?