Does NVCC know if I mean "sqrtf" or do I need to explicitly say so?

dscerutti · January 27, 2024, 2:55pm

I recall that, if one says abs(x) the CUDA compiler will insert the proper absolute value function for the data type of x, e.g. for int x the result will be iabs(x). Is the same true for sqrt(x) and log(x) if I am supplying float (float32_t) or double (float64_t) arguments, or am I required to submit sqrtf(x) and logf(x) in order to get the float32_t equivalents of these functions? I was working through an old code and I noticed that it uses sqrt and log even though the arguments and the variables that will hold the results are all float32_t.

njuffa · January 27, 2024, 10:22pm

CUDA is derivative of C++, so sqrt() is overloaded. Depending on whether a float or a double is passed, this resuts in either a single-precision square root or a double-precision square root.

If you pass an int to sqrt() you may get a compilation error that no matching version can be found among available overloads. I am saying may because I forget whether (and if so, when) C++ added int overloads for standard math functions. If int arguments are supported (e.g. sqrt(5)), you would want to double-check whether sqrt(i) is defined to be equivalent to sqrt((double)i) or sqrt((float)i) if you need to keep code “float clean”.

C++ adopted the C99 standard math library in wholesale fashion at C++11. If you check the ISO C++ standard document you won’t find any details about standard math library functions in it. So sqrtf() still exists, and you can pass int arguments to it without issues: sqrtf(i) is equivalent to sqrtf((float)i).

Using the overloaded plain name math functions is probably generally preferred in CUDA code just like it is in general C++ code. Personally I still prefer to explicitly use the f-suffixed versions of standard math library functions when I write “float clean” code.

dscerutti · January 28, 2024, 1:25pm

Yes, I also prefer to use the explicit f when writing code for all-float. It may be numbers, not functions, but all you need is one 1.234 without specifying it as a float to have your whole right-hand side go double! Thanks for the clarification. I can now rest easy that the old code is indeed a lot slower than the new one, and not because it’s accidentally using a lot of float64_t intrinsics.

system · February 11, 2024, 1:25pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
sqrt(), sqrtf() and use_fast_math CUDA Programming and Performance	3	13870	March 10, 2015
Help understanding sqrt functions in CUDA CUDA Programming and Performance	2	4976	May 11, 2012
CUDA doesn't represent doubles as accurately as floats CUDA Programming and Performance	2	698	July 9, 2013
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5429	April 12, 2012
Float accuracy CUDA Programming and Performance	16	9376	July 22, 2010
Integer square root CUDA Programming and Performance	3	1553	February 18, 2025
What is code compiled with -arch=sm_13 slower? CUDA Programming and Performance	16	3662	April 22, 2009
sqrt precision CUDA Programming and Performance	3	19410	September 9, 2011
nvcc vs. gcc/g++ when using float2, float3 etc CUDA Programming and Performance	3	2639	March 9, 2011
Problem with ^(1/4) CUDA Programming and Performance	10	1045	April 3, 2011

Does NVCC know if I mean "sqrtf" or do I need to explicitly say so?

Related topics