I am working on a project that is very sensitive to the precision using a GPU with 1.3 compute capability. I would like to know whether GPUs can also be set the floating-point control word as on the CPU. The major reason is that I want to know the FPUs on the GPU use a standard double precision or an extended precision like in Intel x86 processors by default. I hope I can set a flag to the GPU to use a round-to-double model if extended precision is used.
Like most architectures except (legacy non-SSE) x86, NVIDIA GPUs have no precision control flag. They just have single-precision instructions operating on single-precision data, and double-precision instructions operating on double-precision data. They directly map to the float and double C datatypes.
No extended-precision format is supported, but there is a Fused Multiply-Add in double precision. It’s use is enabled by default and it might sometimes improve accuracy.
Thaks for your nice reply. I have a further question. That is for the mathematical library with the maximum ULP error table, it is said comparing with “a correctly rounded double-precision result”. I am not sure what is “a correctly rounded double-precision result”? This is obtained from CPU-based C math library? And why it may be not correctly rounded? Since five basic operations have zero ULP error.
I find this confusing too, and I already complained about this table… I believe errors should be measured compared to the exact value.
The “correctly rounded result” means the floating-point number that would be obtained by applying the IEEE-754 rounding rules to the exact answer. Usually that boils down to returning the floating-point number that is closest.
The max rounding error compared to the exact result is 0.5 ulps in this case.
IEEE-754 does not require correct rounding for elementary functions (exp, log, sin, pow…), because it is very hard to achieve (both computationally and theoretically). So most C math libraries do not offer correct rounding, but are generally close (~0.501 ulps).
The CUDA library is a bit less accurate (literally!), but it should not be a problem in practice.
Actully I aslo do not find the ULP descritpions for C math library. So this CUDA result errors should be comapred with the correctly rounded exact results?
Yes, to NVIDIA’s credit, there are few math libraries that document their error bounds. A confusing documentation is better than no documentation. :)
Since the correctly-rounded result has at most 0.5 ulps of error compared to the exact value in round-to-nearest, you can just add 0.5 to all the error bounds of the documentation and get an error bound relative to the exact value. But it may significantly overestimate the error.
en… actually I want to implement some high precision algorithms on the GPU. I want to know if it is reasonable. The most important thing is that double precision on GPUs can follow the IEEE754 standard. However, there are still some concerns. Since many high precision functions (such as exp) are actually based on double-precision functions. Then compared with the math functions in C library, it looks like GPU may produce larger errors. However I also do not find the ulp errors for C math library in any documents.
An error of a few ulps is something small. For instance, 4 ulps means the 53-bit result is accurate to 51 bits. If the two last bits matter for your algorithm, then it’s probably unstable even on the CPU…