in my code i need that each thread compute the maximum between the absolute values of two floats: it would be better to use a MACRO, or fmaxf() and fabsf()?
Thank in advance.
CUDA functions should be as good, or much better, by avoiding extra instructions and branching.
fmax() and fabsf() compile to single machine instructions, so they are much better.
Using macros, you might be lucky that the compiler recognizes the idiom and replaces it with the machine instructions. But relying on that makes no sense.
LLVM will try to do this for many common patterns, but you are still better off just using the standard library