atomicMax() for unsigned long long error compiling

Hello all

When I try to compile for -arch=sm_20 code like

__global__ void f(unsigned long long int *a, int N)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx<N) atomicMax(a, a[idx]);
}

error appears:

error: no instance of overloaded function “atomicMax” matches the argument list
argument types are: (unsigned long long *, unsigned long long)

Then I try to find declaration for this function:

grep -r “atomicMax” find /usr/local/cuda4.2 -name *.h

/usr/local/cuda4.2/cuda/include/sm_11_atomic_functions.h:static inline device int atomicMax(int *address, int val)
/usr/local/cuda4.2/cuda/include/sm_11_atomic_functions.h:static inline device unsigned int atomicMax(unsigned int *address, unsigned int val)

grep -r “atomicMax” find /usr/local/cuda-5.0 -name *.h

/usr/local/cuda-5.0/include/sm_35_atomic_functions.h:static inline device long long atomicMax(long long *address, long long val)
/usr/local/cuda-5.0/include/sm_35_atomic_functions.h:static inline device unsigned long long atomicMax(unsigned long long *address, unsigned long long val)
/usr/local/cuda-5.0/include/sm_11_atomic_functions.h:static inline device int atomicMax(int *address, int val)
/usr/local/cuda-5.0/include/sm_11_atomic_functions.h:static inline device unsigned int atomicMax(unsigned int *address, unsigned int val

grep -r “atomicMax” find /usr/local/cuda-5.5 -name *.h

/usr/local/cuda-5.5/include/sm_11_atomic_functions.h:static inline device int atomicMax(int *address, int val)
/usr/local/cuda-5.5/include/sm_11_atomic_functions.h:static inline device unsigned int atomicMax(unsigned int *address, unsigned int val)
/usr/local/cuda-5.5/include/sm_32_atomic_functions.h:static inline device long long atomicMax(long long *address, long long val)
/usr/local/cuda-5.5/include/sm_32_atomic_functions.h:static inline device unsigned long long atomicMax(unsigned long long *address, unsigned long long val)

According to this output there is no support for atomicMax(unsigned long long *address, unsigned long long val) for sm_20 devices although official documentation states opposite.

Is it bug in toolkits or documentation error?

I’m quoting the CUDA C Programming Guide, Section B.12.1.5. atomicMax()

int atomicMax(int* address, int val);
unsigned int atomicMax(unsigned int* address,
                       unsigned int val);
unsigned long long int atomicMax(unsigned long long int* address,
                                 unsigned long long int val);

[i]
reads the 32-bit or 64-bit word old located at the address address in global or shared memory, computes the maximum of old and val, and stores the result back to memory at the same address. These three operations are performed in one atomic transaction. The function returns old.

The 64-bit version of atomicMax() is only supported by devices of compute capability 3.5 and higher.
[/i]

Accordingly, atomicMax() for unsigned long long should be only supported on devices of compute capability 3.5 and higher and not on yours that has compute capability 2.0.

Thanks, JFSebastian

I was confused by table 11 in G.1 where “yes” (sm_20) for

“Atomic functions operating on 64-bit integer
values in global memory”