Getting different results when using pgfortran vs. gfortran for same exact problem

natanhoffmann · December 15, 2021, 8:47pm

In a computational fluid dynamics solver I have written, I am getting a different result based on the compiler I use. When I compile this code using gfortran with the following flags: -ffree-line-length-512 -Ofast -march=native , I get the “correct” results. However, when I run the same code with pgfortran using the flags -fast -acc -Minfo=accel -ta=tesla:managed , it seems like the boundaries of my domain become incorrect.

To test if this could be a compiler issue, I commented out all OpenACC directives and compiled the code using the abovementioned pgfortran and flags. I still got the same incorrect results. I then removed all flags and compiled the code only using pgfortran with no flags. I still got the same incorrect results.

I am not sure what the issue is or even if this really is a compiler issue. However, with exactly the same code, just with different compilers I am getting considerably different results at my domain boundaries and I am not sure what to do.

Appreciate any advice and if there is any other information I can provide I would be happy to do so.

mfatica · December 15, 2021, 11:35pm

There may be many reasons. I would check:

uninitialized variables/out of bound access
print min/max of arrays after each routine to see when the difference shows up
Try to lower optimization (starting from -O0, increasing to 1, 2 and 3)

I would try to get the correct results on CPU only (and there is no need to remove OpenACC, if you don’t compile with -acc, directive will just be ignored).

MatColgrove · December 16, 2021, 5:05pm

To add to Mass’ answer.

If the difference is large, it’s most likely uninitialized variable(s) or out of bounds access issue. nvfortran doesn’t implicitly initialize variables to zero as some other compilers do. Though you can try adding “-Msave”, which has the compiler apply the SAVE attribute to all variables, but also the side effect of initializing global (module) variables to zero. Note I find the Valgrind (www.valgrind.org) utility very helpful in finding uninitialized variables and out-of-bound access errors.

If the results are close but not exact, look to instruction choice and optimization level.

We use FMA (fuse-multiply add) on systems which support the instruction. While more precise than the equivalent multiply followed by and add due to less rounding error, it can cause numerical difference. Try adding “-Mnofma” to disable.

There’s also the “-Kieee” flag, which has the compiler adhere to strict compliance with IEEE754. Though with the exception of the “-Mfprelaxed” flag (not enabled by default), even without -Kieee we’re usually only 1-2ULPS off so quite accurate.

Also keep in mind that applying parallelism will also impact accuracy since rounding error would be different due to the order of operations. Some small variance is to be expected between compilers, optimization applied, and architectures. It’s best to allow for some tolerances rather than seek bit-for-bit exactness.

natanhoffmann · December 20, 2021, 5:34pm

Thanks everyone for your comments and suggestions, I really appreciate them all.

I figured out what the issue was. I just want to be clear that when I was testing compiler vs. compiler I was not using any flags for either to make sure that it would not be an optimization issue or something else.

Essentially, the error came from my boundary conditions, for which I use ghost cells. In the isothermal wall boundary condition, I was computing the temperature, setting a zero gradient condition for the pressure, and then computing the density from temperature and pressure.

These boundary conditions were being set as:

T_g = 2*T_b - T_i

p_g = p_i

rho_g = gamma*p_g/T_g

where the subscripts g, b, i denote ghost, boundary, and interior cell, respectively. In a sequential manner, this produces the intended result, since p_g and T_g are set before rho_g and thus I may compute rho_g from them. I guess somehow this was not being done sequentially when running pgfortran so p_g and T_g were not set before computing density. I have no idea how this makes sense since without the -acc flag the code should just run sequentially. However, modifying the density boundary condition to be:

rho_g = gamma*p_i/T_i

fixed the issue.

Topic		Replies	Views
-fast compiler instruction is producing incorrect results. Legacy PGI Compilers	8	7127	May 2, 2014
Fortran compilation problem. Legacy PGI Compilers	1	8966	March 18, 2010
Pgfotran hangs while compiling this simple program Legacy PGI Compilers	2	729	November 30, 2021
Odd error maybe due to numerical resolution? Legacy PGI Compilers	3	2462	February 1, 2011
different results with pgf90, gfortran and ifort Legacy PGI Compilers	1	8565	July 31, 2012
Can -acc generate different numerical results ? Legacy PGI Compilers	1	1329	March 25, 2019
syntax error ? Legacy PGI Compilers	8	6380	May 28, 2013
Runtime-problem-with-pgfortran and OpenACC CUDA Programming and Performance	1	346	October 7, 2019
Openacc, command exited with non_zero status 1 nvc, nvc++ and nvfortran cuda , ubuntu	18	1625	August 11, 2021
Output incorrect when using accelerator Legacy PGI Compilers	3	4118	September 18, 2010

Getting different results when using pgfortran vs. gfortran for same exact problem

Related topics