Recently, I tried to use the cuda sqrt function to get the square root value of an integer of long long type

and I am disappointed in its precision.

for example:

the cpu version

The square root of 625 is 25

The square root of 10616005156 is 103034

The square root of 3014768326492836 is 54906906

The square root of 24597324665761 is 4959569

The square root of 15594601 is 3949

the gpu version

The square root of 625 is 25

The square root of 10616005156 is 103033

The square root of 3014768326492836 is 54906904

The square root of 24597324665761 is 4959569

The square root of 15594601 is 3949

As you can see from my experiment, the cpu and gpu version is a little different, and this causes trouble to me

because I need at least the precision like the cpu version.

I think the main problem is the implementation of the cuda sqrt function.

it is sqrt(x) = 1/rsqrt(x);

If anyone has such problem like me, please tell me how to correct it.

Thanks.