compiler options for REAL arithmetic

Mr.Smith · October 29, 2010, 8:55pm

I have to rewrite a Fortran program such that the new one makes use of CUDA and the results matches the old one. There are a lot of calculations in it and all of them uses REAL variables without Kinds set – they are actually REAL4. The calculations in the program includes adding(+), subtracting(-), multiplying(), dividing(/), exponential(**) and the functions sqrt() and abs(). The modified code compiles OK with the same warnings about ‘Predefined intrinsic loses intrinsic property’ in the subroutine that uses MOD(Integer,Integer). The calculations now are done some in CPU and some in GPU.

However, the results don’t match. After reading the FAQ about execution precision, I suppose there may be some issues about the extended precision on Intel processors. So I tried using -Kieee and ‘-r4 -pc 32’ set. The results of the new and old still don’t match, still having quite small errors.

At this point, how do I get the results to match? Does the GPU perform REAL variables arithmetic in single or double precision? Do the options ‘-r4’ or ‘-pc xx’ affect GPU calculations at all?

I also have some questions on Fortran programming.

In dividing, do B = A / 3, and B = A / 3.0 work the same? If B is a REAL * 4, what will happen if I assign B = A / 3.0D0? Will it be different?

MatColgrove · October 30, 2010, 12:38am

Hi Mr.Smith,

NVIDIA GPUs are not IEEE 754 compliant, though they are getting better. From NVIDA’s Fermi Tuning Guide:

IEEE 754-2008 Compliance
Devices of compute capability 2.0 have far fewer deviations from the IEEE 754-2008 floating point standard than devices of compute capability 1.x, particularly in single precision (Section G.2). This can cause slight changes in numeric results between devices of compute capability 1.x and devices of compute capability 2.0.
In particular, addition and multiplication are often combined into an FMAD instruction on devices of compute capability 1.x, which is not IEEE compliant, whereas they would be combined into an FFMA instruction on devices of compute capability 2.0, which is IEEE compliant.

At this point, how do I get the results to match?

First, try disabling FMA via the “-Mcuda=nofma” especially if you are using a 1.x device. You may loose some precision, but should get more accuracy. Note that if you don’t know your device’s compute capability, run the PGI utility “pgaccelinfo” and look for the “Device Revision Number”.

Does the GPU perform REAL variables arithmetic in single or double precision?

By default a REAL is single. REAL will be double only if the “-r8” flag is used.

Do the options ‘-r4’ or ‘-pc xx’ affect GPU calculations at all?

“-r4” is the default so has no effect. “-pc xx” only effects the x87 FPU and has no effect on the GPU.

In dividing, do B = A / 3, and B = A / 3.0 work the same?

Assuming A and B are REAL, then 3 will be cast from an integer to a real, and the A / 3 will be executed. The cast shouldn’t effect results.

If B is a REAL * 4, what will happen if I assign B = A / 3.0D0? Will it be different?

A will be cast to a double, the A/3.0D0 will be performed and the result will then be cast back to single. This could effect the end result.

Note that the GPU has two different divides. The first is fast but can loose precision. The seconds takes 4 times as long but is more accurate. In earlier versions of the compiler (<= 10.4), by default we used the fast version. In later versions, we switched to using the more accurate version by default with the fast version being used when “-Mcuda=fastmath” flag is added. If you are using an earlier compiler version, you should consider updating to the most current release.

Hope this helps,
Mat

Topic		Replies	Views
Double Precision errors Legacy PGI Compilers	5	2605	June 12, 2018
Difference in double precision results? Legacy PGI Compilers	1	1976	August 9, 2010
Help Needed: Precision Mismatch between GPU and CPU Calculations of AAD Limiter CUDA Programming and Performance	3	58	December 23, 2024
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5453	April 12, 2012
Why different versions of CUDA affect the results? nvc, nvc++ and nvfortran	4	615	August 10, 2022
OpenACC Fortran reproductibility between CPU and GPU Legacy PGI Compilers	2	2925	October 14, 2016
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	868	August 9, 2023
Precision in Fortran intrinsics EXP. CUDA Programming and Performance	4	1114	January 29, 2018
Floating point results mismatch: CPU vs OpenACC Legacy PGI Compilers	3	3895	May 26, 2015
Why accuracy CPU and GPU not equal? CUDA Programming and Performance	6	10981	October 28, 2014

compiler options for REAL arithmetic

Related topics