The OpenCL header file from the CUDA sdk doesn’t define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT, nor do NVidia OpenCL devices return that information in the Single FP Config information (http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf#page=234). This means Divide and SQRT calculated on the devices are not done according to the IEEE 754 spec, and you can trivially check this for yourself.
Is this something that was maybe overlooked, or planned to be implemented later, or is there an inherent reason why it won’t/can’t be implemented? Or have I just missed something like downloading the wrong driver (I have the CUDA 5.5 SDK).
I don’t believe NVIDIA is prioritizing OpenCL 1.2, and certainly haven’t heard anything myself… there’s been a few posts in the last few months about it and no definite answer from NVIDIA. If you need OpenCL 1.2 for some reason, I’d say go over to ATI.
I do not use OpenCL, but I assume this is about single-precision division and sqrt? For a reference single-precision IEEE-754 rounded square root, you might want to try adapting the following code:
Of course one could also use the host’s math.h sqrtf() implementation as the reference, as there are only 2**32 test cases to consider so an exhaustive test would take less than half an hour. Just make sure to dial up the “strict” floating-point settings of the host compiler, otherwise the host library square roots may not be correctly rounded! Testing division is a bit trickier, one cannot really do that exhaustively, even for single precision.
Before expending time on a test effort, it probably would be advisable to first establish what properties are actually promised by NVIDIA’s OpenCL implementation based on the documentation and feature strings.