now Fermi has the native atomicAdd for floats, is there a preprocessor (macro) so that I can use the native function when compiling my code for a Fermi?
Also, is there a way to do this at the run-time? I mean when running the kernel, if it sees a Fermi, it uses the native atomic function, otherwise, use the hacked version?
when I compile a cu unit with nvcc -arch=sm_11, sm_12,…,sm_20, is there a corresponding macro that I can use inside the cu unit to tell me which gpu-arch that I am compiling against?