Does NVCC know if I mean "sqrtf" or do I need to explicitly say so?

I recall that, if one says abs(x) the CUDA compiler will insert the proper absolute value function for the data type of x, e.g. for int x the result will be iabs(x). Is the same true for sqrt(x) and log(x) if I am supplying float (float32_t) or double (float64_t) arguments, or am I required to submit sqrtf(x) and logf(x) in order to get the float32_t equivalents of these functions? I was working through an old code and I noticed that it uses sqrt and log even though the arguments and the variables that will hold the results are all float32_t.

CUDA is derivative of C++, so sqrt() is overloaded. Depending on whether a float or a double is passed, this resuts in either a single-precision square root or a double-precision square root.

If you pass an int to sqrt() you may get a compilation error that no matching version can be found among available overloads. I am saying may because I forget whether (and if so, when) C++ added int overloads for standard math functions. If int arguments are supported (e.g. sqrt(5)), you would want to double-check whether sqrt(i) is defined to be equivalent to sqrt((double)i) or sqrt((float)i) if you need to keep code “float clean”.

C++ adopted the C99 standard math library in wholesale fashion at C++11. If you check the ISO C++ standard document you won’t find any details about standard math library functions in it. So sqrtf() still exists, and you can pass int arguments to it without issues: sqrtf(i) is equivalent to sqrtf((float)i).

Using the overloaded plain name math functions is probably generally preferred in CUDA code just like it is in general C++ code. Personally I still prefer to explicitly use the f-suffixed versions of standard math library functions when I write “float clean” code.

1 Like

Yes, I also prefer to use the explicit f when writing code for all-float. It may be numbers, not functions, but all you need is one 1.234 without specifying it as a float to have your whole right-hand side go double! Thanks for the clarification. I can now rest easy that the old code is indeed a lot slower than the new one, and not because it’s accidentally using a lot of float64_t intrinsics.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.