floating point processor of GPUs

mianlu · July 13, 2010, 9:00am

Hi All, I know that the recent generation of GPUs are IEEE-compliant and can support double precision. However, I still wonder the details of the floating point processor of GPUs. For the FPU in x86 processor, underlying 80-bit may be used for floating point numbers. I do not find the technique details of FPU on GPUs, it actually also implements 80-bit based double precision?

This problem is from my recent work. In some cases (although the probability is very very low), the floating point arithmetic operators on the GPU have different behaviors compared with CPUs. On the CPU, I have forced the double precision to use 64-bit rather than 80-bit.

Additionally, I also think this problem is related to the compiler. My program do not meet such problems while using CUDA 2.0, but has problems in all later CUDA versions.

I have not figure out a way to separate this problem from my complicated program, thus i am sorry for the long and boring text…

Sylvain_Collange · July 13, 2010, 9:42am

GPUs have a Fused Multiply-Add instruction in double precision, which is more accurate than a multiplication followed by an addition. The compiler can replace (a * b) + c sequences by FMAs, which can return a different answer.

There is no such thing as extended 80-bit precision on GPUs.

Appendix G.2 of the programming guide has all details about IEEE-compliance.

mianlu · July 13, 2010, 9:47am

Thanks for your answers. I know the FMAs, and I have already avoid such things in both GPU and CPU. I think this is not the problem in my program. Let my try to simplify my program and post some code later.

dmyablonski · July 13, 2010, 3:12pm

If you have looked into it as far as you have, I’m sure you know that the double precision floating point is not 100% IEEE standard, as in the CPU I believe it is required to be. On man iterations of CG in DP I also get differing results though have not found an example yet that completely confuses the GPU numerically, yet. I’d bet they exist. The new architectures should be much better but I don’t believe even fermi is 100% IEEE floating point format.

Sylvain_Collange · July 13, 2010, 3:56pm

Actually, IEEE-754(2008) support on Fermi is already much better than on most CPUs, especially x86 (even with SSE).

Fermi supports FMA in single and double precision and has hardware support for subnormals. Current x86 don’t.

Fermi still lacks support for flags and exceptions however.

The architectures that I think offer “better” floating point support are the AMD Evergreen architecture (another GPU, strange enough!) (supports unfused multiply-add and lots of other fused or unfused ops in addition to FMA, and supports IEEE flags), and IBM Power6/7 (supports decimal floating point and lots of exotic rounding modes).

YDD · July 13, 2010, 6:03pm

I think that few, if any, CPUs are fully IEEE compliant either. Poking through compiler man pages usually turns up an option which states something like “Enforces full IEEE compliance on all operations. Will make code glacially slow.” AIUI, getting pretty good IEEE compliance in hardware is (relatively) easy, but sorting out all the nooks and crannies of the standard is quite difficult and costly. Since these nooks and crannies don’t affect most code, hardware designers don’t bother, leaving it to the compiler writers to sort out in software.

Sarnath · July 14, 2010, 12:17pm

Commutative operators like addition and multiplication are “not” commutative when it comes to floating point operations on computers.
Because computer memory is limited and truncation happens on every operation… (just like how a modulus operation enforces ordering…)

Parallel decomposition changes the order of operations and hence the results. Even normal reductions performed on a large data set can give widely varying answers.

eduardodejesus · August 28, 2015, 5:27pm

Does anyone know of a CUDA library for decimal floating-point computations

Topic		Replies	Views
Floating Point Accuracy CUDA Programming and Performance	11	30418	April 6, 2013
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	763	August 9, 2023
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10629	November 26, 2009
Floating-point precision problems CUDA Programming and Performance	14	4387	January 7, 2011
floating point operations CUDA Programming and Performance	13	6779	May 16, 2010
floating point precision CUDA Programming and Performance	3	1462	April 10, 2009
Floating points CUDA Programming and Performance	3	2060	October 28, 2010
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5412	April 12, 2012
Floating Point Precision of GPU CUDA Programming and Performance	6	2205	September 9, 2010
Help Needed: Precision Mismatch between GPU and CPU Calculations of AAD Limiter CUDA Programming and Performance	3	27	December 23, 2024

floating point processor of GPUs

Related topics