compiler options for REAL arithmetic

I have to rewrite a Fortran program such that the new one makes use of CUDA and the results matches the old one. There are a lot of calculations in it and all of them uses REAL variables without Kinds set – they are actually REAL4. The calculations in the program includes adding(+), subtracting(-), multiplying(), dividing(/), exponential(**) and the functions sqrt() and abs(). The modified code compiles OK with the same warnings about ‘Predefined intrinsic loses intrinsic property’ in the subroutine that uses MOD(Integer,Integer). The calculations now are done some in CPU and some in GPU.

However, the results don’t match. After reading the FAQ about execution precision, I suppose there may be some issues about the extended precision on Intel processors. So I tried using -Kieee and ‘-r4 -pc 32’ set. The results of the new and old still don’t match, still having quite small errors.

At this point, how do I get the results to match? Does the GPU perform REAL variables arithmetic in single or double precision? Do the options ‘-r4’ or ‘-pc xx’ affect GPU calculations at all?

I also have some questions on Fortran programming.

In dividing, do B = A / 3, and B = A / 3.0 work the same? If B is a REAL * 4, what will happen if I assign B = A / 3.0D0? Will it be different?

Hi Mr.Smith,

NVIDIA GPUs are not IEEE 754 compliant, though they are getting better. From NVIDA’s Fermi Tuning Guide:

IEEE 754-2008 Compliance
Devices of compute capability 2.0 have far fewer deviations from the IEEE 754-2008 floating point standard than devices of compute capability 1.x, particularly in single precision (Section G.2). This can cause slight changes in numeric results between devices of compute capability 1.x and devices of compute capability 2.0.
In particular, addition and multiplication are often combined into an FMAD instruction on devices of compute capability 1.x, which is not IEEE compliant, whereas they would be combined into an FFMA instruction on devices of compute capability 2.0, which is IEEE compliant.



At this point, how do I get the results to match?

First, try disabling FMA via the “-Mcuda=nofma” especially if you are using a 1.x device. You may loose some precision, but should get more accuracy. Note that if you don’t know your device’s compute capability, run the PGI utility “pgaccelinfo” and look for the “Device Revision Number”.

Does the GPU perform REAL variables arithmetic in single or double precision?

By default a REAL is single. REAL will be double only if the “-r8” flag is used.

Do the options ‘-r4’ or ‘-pc xx’ affect GPU calculations at all?

“-r4” is the default so has no effect. “-pc xx” only effects the x87 FPU and has no effect on the GPU.

In dividing, do B = A / 3, and B = A / 3.0 work the same?

Assuming A and B are REAL, then 3 will be cast from an integer to a real, and the A / 3 will be executed. The cast shouldn’t effect results.

If B is a REAL * 4, what will happen if I assign B = A / 3.0D0? Will it be different?

A will be cast to a double, the A/3.0D0 will be performed and the result will then be cast back to single. This could effect the end result.

Note that the GPU has two different divides. The first is fast but can loose precision. The seconds takes 4 times as long but is more accurate. In earlier versions of the compiler (<= 10.4), by default we used the fast version. In later versions, we switched to using the more accurate version by default with the fast version being used when “-Mcuda=fastmath” flag is added. If you are using an earlier compiler version, you should consider updating to the most current release.

Hope this helps,
Mat