sqrt precision

L.Allen · September 3, 2010, 2:14am

Recently, I tried to use the cuda sqrt function to get the square root value of an integer of long long type

and I am disappointed in its precision.

for example:

the cpu version
The square root of 625 is 25
The square root of 10616005156 is 103034
The square root of 3014768326492836 is 54906906
The square root of 24597324665761 is 4959569
The square root of 15594601 is 3949

the gpu version
The square root of 625 is 25
The square root of 10616005156 is 103033
The square root of 3014768326492836 is 54906904
The square root of 24597324665761 is 4959569
The square root of 15594601 is 3949

As you can see from my experiment, the cpu and gpu version is a little different, and this causes trouble to me

because I need at least the precision like the cpu version.

I think the main problem is the implementation of the cuda sqrt function.

it is sqrt(x) = 1/rsqrt(x);

If anyone has such problem like me, please tell me how to correct it.

Thanks.

njuffa · September 3, 2010, 8:43am

An (unsigned) long long uses up to 64 bits to represent an integer. The effective mantissa length of an IEEE-754 single-precision floating-point number is 24 bits, while the effective mantissa length of an IEEE-754 double-precision floating-point number is 53 bits. Clearly not all integers representable by a long long or unsigned long long operand are exactly representable in a double operand, let alone a float operand. x86 CPUs may use extended precision floating-point numbers internally (when targetting the x87 FPU rather than SSEx), and this extended precision format uses 64 mantissa bits, thus allowing all long longs to be stored in an extended precision floating-point operand without loss of accuracy.

Note that on a sm_1x GPU, the single-precision sqrtf() is approximate; to get a single-precision square root rounded in compliance with IEEE-754 round-to-nearest-or-even mode, use the device function __fsqrt_rn() instead. On sm_2x devices, sqrtf() has IEEE-754 compliant round-to-nearest-or-even rounding by default, and the behavior is controllable with the -prec-sqrt={true|false} compiler flag. This flag makes sqrtf() either use IEEE-754 rounding (when -prec-sqrt=true) or approximate (when -prec-sqrt=false). The __frsqrt_rn() device function is available across GPUs of all compute capabilities, but it’s significantly faster on sm_2x devices (due to better HW support). The double precision function sqrt() always rounds according to the IEEE-754 round-to-nearest-or-even mode (on all devices that support double precision, i.e. sm_13 and higher).

When using square root operations with IEEE-754 rounding, and sticking to integer operands that are accurately representable in the selected floating-point format, the same result is achieved on both the CPU and the GPU.

njuffa · September 3, 2010, 8:43am

An (unsigned) long long uses up to 64 bits to represent an integer. The effective mantissa length of an IEEE-754 single-precision floating-point number is 24 bits, while the effective mantissa length of an IEEE-754 double-precision floating-point number is 53 bits. Clearly not all integers representable by a long long or unsigned long long operand are exactly representable in a double operand, let alone a float operand. x86 CPUs may use extended precision floating-point numbers internally (when targetting the x87 FPU rather than SSEx), and this extended precision format uses 64 mantissa bits, thus allowing all long longs to be stored in an extended precision floating-point operand without loss of accuracy.

Note that on a sm_1x GPU, the single-precision sqrtf() is approximate; to get a single-precision square root rounded in compliance with IEEE-754 round-to-nearest-or-even mode, use the device function __fsqrt_rn() instead. On sm_2x devices, sqrtf() has IEEE-754 compliant round-to-nearest-or-even rounding by default, and the behavior is controllable with the -prec-sqrt={true|false} compiler flag. This flag makes sqrtf() either use IEEE-754 rounding (when -prec-sqrt=true) or approximate (when -prec-sqrt=false). The __frsqrt_rn() device function is available across GPUs of all compute capabilities, but it’s significantly faster on sm_2x devices (due to better HW support). The double precision function sqrt() always rounds according to the IEEE-754 round-to-nearest-or-even mode (on all devices that support double precision, i.e. sm_13 and higher).

When using square root operations with IEEE-754 rounding, and sticking to integer operands that are accurately representable in the selected floating-point format, the same result is achieved on both the CPU and the GPU.

siramirsaman · September 9, 2011, 3:29am

This is the way I came around this problem,

I couldn’t get -prec-sqrt=true to work, so I went for this algorythm : Babylonian method, Methods of computing square roots - Wikipedia

this works like this:

for calculating square root of S:

x_0 = initial guess;
x[sub]n+1[/sub] = 1/2.0 * ( x[sub]n[/sub] + S / x[sub]n[/sub] );

as for my initial guess I chose sqrtf() from cuda!!
for example below sqrt(1234567891) is calculated:

double x_n;
x_n=sqrtf(1234567891);
double s=1234567891;
for (int tt=1;tt<10;++tt){
x_n=1.0/2.0*(x_n+s/x_n);
}

this is the results:

Matlab on CPU:
3.513641830067487e+004

Babylonian method above with GPU:
3.5136418300674872e+004

Cuda sqrtf:
3.513641796875e+004

as you can see above method reaches great resluts to be compared with cpu in very low number of iterations!!
( I haven’t tried optimizing that tt=10 number of iterations!!)

Topic		Replies	Views
Problem on sqrt precision CUDA Programming and Performance	5	2347	January 10, 2011
Help understanding sqrt functions in CUDA CUDA Programming and Performance	2	5020	May 11, 2012
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5483	April 12, 2012
CUDA innacuracy? CUDA float produces different result from CPU float CUDA Programming and Performance	8	3076	September 9, 2011
sqrt function in CUDA kernel function call fails CUDA Programming and Performance	2	15349	November 5, 2007
CUDA doesn't represent doubles as accurately as floats CUDA Programming and Performance	2	715	July 9, 2013
OpenCL sqrt() precision CUDA Programming and Performance	1	4048	March 27, 2012
Double precision square root function Legacy PGI Compilers	1	2283	December 8, 2010
Correctly rounded rsqrt in double precision? CUDA Programming and Performance	10	2851	October 15, 2013
Accuracy in GPU floating point calculations CUDA Programming and Performance	35	8327	September 9, 2011

sqrt precision

Cuda sqrtf: 3.513641796875e+004

Related topics

Cuda sqrtf:
3.513641796875e+004