Problem on sqrt precision


I have problems with the sqrt precision on OpenCL kernel. For instance sqrt( 14400.0f ) seems to return 120.000008. The error is not a big one but later it expands. Is there any flag or precision qualifier to give more accuracy? GPU is GeForce 8600 GT.


In the NVIDIA (OpenCL Programming Guide:

“Square root is implemented via the reciprocal square root in a nonstandard-compliant way;” (pg 54) …

"But, IEEE-compliant software (and therefore slower) implementations are provided through the following intrinsics from Appendix B:

“native_sqrt(float): single-precision square root with IEEE rounding modes;”

I would give that a shot and see if it helps. You’ll take a small performance hit, but if accuracy is important, it might be worthwhile.

Thanks, but native_sqrt seemed to give the same result.


How you get “120.000008” from a single precision float that only has room for 7 digits?

Floating point numbers are represented in base 2, so displaying such a number in base 10 is not always possible in 7 digits or less. (For example, 1 + 2**-22 can be represented exactly in single precision, but requires a lot more than 7 digits to represent exactly in base 10.) The fractional distance between one floating point number and the next one is approximately 10^-7, which is where the 7 significant digit rule of thumb comes from.

[edited later to reflect that the information is specific to the single-precision variant of native_sqrt()]

Thanks for alerting us to this documentation issue. Table C-2 and section C.2 incorrectly state that native_sqrt() maps to a square root with IEEE rounding. In fact the single-precision variant of the function maps to the PTX instruction sqrt.approx.f32, that is, an approximate square root satisfying the accuracy bound of <= 3 ulps set forth in the OpenCL specification. I will follow up with our documentation team.