I recall that, if one says abs(x)
the CUDA compiler will insert the proper absolute value function for the data type of x
, e.g. for int x
the result will be iabs(x)
. Is the same true for sqrt(x)
and log(x)
if I am supplying float
(float32_t
) or double
(float64_t
) arguments, or am I required to submit sqrtf(x)
and logf(x)
in order to get the float32_t
equivalents of these functions? I was working through an old code and I noticed that it uses sqrt
and log
even though the arguments and the variables that will hold the results are all float32_t
.
CUDA is derivative of C++, so sqrt()
is overloaded. Depending on whether a float
or a double
is passed, this resuts in either a single-precision square root or a double-precision square root.
If you pass an int
to sqrt()
you may get a compilation error that no matching version can be found among available overloads. I am saying may because I forget whether (and if so, when) C++ added int
overloads for standard math functions. If int
arguments are supported (e.g. sqrt(5)
), you would want to double-check whether sqrt(i)
is defined to be equivalent to sqrt((double)i)
or sqrt((float)i)
if you need to keep code “float
clean”.
C++ adopted the C99 standard math library in wholesale fashion at C++11. If you check the ISO C++ standard document you won’t find any details about standard math library functions in it. So sqrtf()
still exists, and you can pass int
arguments to it without issues: sqrtf(i)
is equivalent to sqrtf((float)i)
.
Using the overloaded plain name math functions is probably generally preferred in CUDA code just like it is in general C++ code. Personally I still prefer to explicitly use the f
-suffixed versions of standard math library functions when I write “float
clean” code.
Yes, I also prefer to use the explicit f
when writing code for all-float. It may be numbers, not functions, but all you need is one 1.234 without specifying it as a float to have your whole right-hand side go double
! Thanks for the clarification. I can now rest easy that the old code is indeed a lot slower than the new one, and not because it’s accidentally using a lot of float64_t
intrinsics.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.