CUDA OpenCL implementation has no support for IEEE 754 spec divide/sqrt

The OpenCL header file from the CUDA sdk doesn’t define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT, nor do NVidia OpenCL devices return that information in the Single FP Config information (http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf#page=234). This means Divide and SQRT calculated on the devices are not done according to the IEEE 754 spec, and you can trivially check this for yourself.

This is a bit weird since there is a research paper NVidia have published highlighting how IEEE 754 compliant the GPUS are: https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf, and we have an entire list of the Compute Capabilities of various devices (CUDA GPUs - Compute Capability | NVIDIA Developer) yet we can’t do accurate math on any of them using OpenCL.

Is this something that was maybe overlooked, or planned to be implemented later, or is there an inherent reason why it won’t/can’t be implemented? Or have I just missed something like downloading the wrong driver (I have the CUDA 5.5 SDK).

After digging around a bit, I realised that these things are OpenCL 1.2 specification. OpenCL 1.2 support in NVIDIA drivers - Announcements - NVIDIA Developer Forums is another topic asking about OpenCL 1.2 support… so it seems that OpenCL 1.2 is not supported yet. Any update on when it will be available?

I don’t believe NVIDIA is prioritizing OpenCL 1.2, and certainly haven’t heard anything myself… there’s been a few posts in the last few months about it and no definite answer from NVIDIA. If you need OpenCL 1.2 for some reason, I’d say go over to ATI.

Now NV has OpenCL 1.2, anybody did tests to see if IEEE 754 is handled correctly now?

See [url]IEEE 754 floating-point test software for a test. It “only” needs to be ported to OpenCL.

I do not use OpenCL, but I assume this is about single-precision division and sqrt? For a reference single-precision IEEE-754 rounded square root, you might want to try adapting the following code:

[url]c++ - How to Perform Tuckerman Rounding for Floating Point Square Root - Stack Overflow

Of course one could also use the host’s math.h sqrtf() implementation as the reference, as there are only 2**32 test cases to consider so an exhaustive test would take less than half an hour. Just make sure to dial up the “strict” floating-point settings of the host compiler, otherwise the host library square roots may not be correctly rounded! Testing division is a bit trickier, one cannot really do that exhaustively, even for single precision.

Before expending time on a test effort, it probably would be advisable to first establish what properties are actually promised by NVIDIA’s OpenCL implementation based on the documentation and feature strings.