Help understanding sqrt functions in CUDA

HenrikAndresen · May 7, 2012, 12:55pm

Hi All

I’m doing some performance testing to evaluate different functions in CUDA, and I have come upon the functions to calculate the square root. Here is both a normal ‘sqrtf’ and an intrinsic ‘__fsqrt_rn’.

The second is approximately three times slower. Is the only difference numerical accuracy? Or am I reading the CUDA C Programming Guide wrong?

I run the tests on a GTX480 using Cuda Toolkit 4.0.

Thank you

Henrik Andresen

njuffa · May 11, 2012, 11:07am

sqrtf() is a single-precision square root function that can map either to an approximate square root implementation, or one that rounds to nearest or even according to the IEEE-754 standard.

On sm_1x devices, sqrtf() always maps to the approximate square root implementation. On sm_2x and sm_3x devices the mapping is controlled by the compiler flag -prec-sqrt={true|false}. The default setting is “true”. When -prec-sqrt=false is specified, sqrtf() maps to the approximate square root implementation, with -prec-sqrt=true it maps to the IEEE-rounded one. -use_fast_math implies -prec-sqrt=false.

__fsqrt_rn() always maps to an implementation that rounds to nearest-or-even according to the IEEE-754 standard. It is quite slow on sm_1x devices since the hardware does not support the single-precision FMA (fused multiply-add) operation which is crucial to high performance implementations of correctly rounded square root.

Even on sm_2x and sm_3x devices significant performance differences between approximate and IEEE-rounded versions can be observed, which is simply a consequence of the work necessary to guarantee the standard compliant result. Over successive generations of CUDA, a lot of work has gone into providing optimized implementations of such correctly rounded mathematical primitives.

HenrikAndresen · May 11, 2012, 11:09am

Hi Njuffa

Thank you for your reply. That clarified things!

Cheers

Henrik Andresen

Topic		Replies	Views
Implementation of sqrt CUDA Programming and Performance	3	11139	November 8, 2007
sqrt(), sqrtf() and use_fast_math CUDA Programming and Performance	3	14162	March 10, 2015
sqrt precision CUDA Programming and Performance	3	19513	September 9, 2011
sqrt function in CUDA kernel function call fails CUDA Programming and Performance	2	15369	November 5, 2007
Problem on sqrt precision CUDA Programming and Performance	5	2385	January 10, 2011
Performance tweak for single-precision square root CUDA Programming and Performance	0	1006	March 25, 2021
Correctly rounded rsqrt in double precision? CUDA Programming and Performance	10	2903	October 15, 2013
is reciprocal square root directly accessible? CUDA Programming and Performance	1	1418	September 8, 2009
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5529	April 12, 2012
CUDA innacuracy? CUDA float produces different result from CPU float CUDA Programming and Performance	8	3113	September 9, 2011

Help understanding sqrt functions in CUDA

Related topics