Kieee and 10.5

Mat (et al),

I just noticed that if I use -Mcuda and -Kieee together with 10.5, it works…or at least doesn’t complain like it did with 10.4 and below:

> pgfortran -V10.4 -Mcuda=keepgpu,keepbin,keepptx,fastmath,nofma -Kieee -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -DFLXY -DDEG2 -c src/ error: expected a ")"
... cut a lot of these errors... error: expected a ")"

25 errors detected in the compilation of "/tmp/pgnvdddXgjVTRQ3JW.nv0".
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code       0 (src/ 1251)
PGF90/x86-64 Linux 10.4-0: compilation aborted

> pgfortran -V10.5 -Mcuda=keepgpu,keepbin,keepptx,fastmath,nofma -Kieee -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -DFLXY -DDEG2 -c src/

Does this mean that -Kieee flag affects the CUDA code, or does it mean that host-code math contained within a .CUF can now be IEEE?

Or, should one even do this? Mix -Kieee with -Mcuda?


ETA: I just realized I have fastmath and Kieee and nofma. I think I’d need a flowchart to figure out what the different combinations of all those do…and if any of them are recommended or warned against!

I’ve done a bit of experimenting and it does look like -Kieee affects the GPU code as well. I’d love to know what it changes in the GPU code since my guess is (and some testing shows) that -Kieee >> normal >> fastmath (where A>>B equals A is more precise than B).

And, of course, adding nofma seems to help everything (though it can’t overcome fastmath, I’m guessing).

Hi Matt,

I’d need to confirm, but my assumption is that “-Kieee” has, for the most part, the same effect that it has on regular Fortran. Things like changing the order of operation and identities would conform to IEEE 754. These types of transformations occur before the compiler gets to the GPU code generation phase

For the transcendental functions, -Kieee will use more accurate version on the CPU but not on the GPU. For fast versions on the GPU, use the “-Mcuda=fastmath” option.

  • Mat