The machine on which I’m working on getting a good pgfortran build is a Centos machine with two Intel Xeon 5500 Series processors (and three TESLA C1060 cards which we’ll be using later once the program runs well).
Currently the program runs from a g77 or gfortran build on 32-bit openSuse and Centos desktops and servers with Intel Pentium familly processors.
It runs on this new machine from a g77 or gfortran build (and gets identical results to those it gets on the other machines), when I build it with these flags:
-ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3
(I was building in 32-bit because I don’t yet have a good build of 64-bit netCDF libraries for these compilers (and the 32-bit libraries are built to the f2c conventions) and also because I wanted initially to use the same makefile for all the machines that run the program.)
===
After my last post I discovered that simply removing the -fastsse flag “fixed” the problem of the widely different results.
So
-Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -O3
gave the apparently wrong results, whereas
-Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O3
gives results very close to those I see from 32-bit g77- and gfortran-built programs.
===
After reading your post, Mat, I did runs with the several different combinations of swithces as follows:
“-O2 -tp piii”: Build fails
“-O2 -tp piii -pc 64”: Build fails
“-O0 -Kieee”: “Good” results (These are listed as “pgi 3” in the results below)
“-fast -Kieee”: “Very good” results (These are listed as “pgi 5” below)
“-fastsse -Kieee”: Listed below as “pgi 4”; identical results to those with -fast rather than -fastsse.
(In the first two cases the build fails with the error message:
“PGC-F-0155-built-in __m128, __m128d, __m128i data types require compilation for 64-bit architectures or 32-bit architectures that support SSE1 and SSE2 instructions.”
And this is fine, because we won’t be wanting to build a 32-bit version on this machine anyway.)
===
The results for 26 key variables are shown below (numerical values and as a percentage difference from the gfortran results (chosen arbitrarilly as the refernce). (Some of these values are derived from multistep calculations performed hourly for about a hundred years, so it is expected that a certain amount of error will accumulate.)
The column headings refer to the following sets of compiler flags:
gfortran = -ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3
g77 = -ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3
pgi 1 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend
pgi 2 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O3
pgi 3 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O0 -Kieee
pgi 4 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -Kieee
pgi 5 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fast -Kieee
pgi 6 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse
pgi 7 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -O3
Numerical Results for 26 key variables:
gfortran g77 pgi 1 pgi 2 pgi 3 pgi 4 & 5 pgi 6 & 7
267365.3901 267262.2062 267664.7928 267588.175 267747.227 267230.7771 138874.5163
960726.722 960702.466 960951.003 960860.212 960992.998 960737.081 846207.571
1833429.116 1833432.56 1833732.699 1833604.846 1833753.523 1833489.6 1615360.837
8555.657 8557.483 8559.163 8558.46 8556.856 8556.383 5944.643
1322040.796 1322406.536 1321578.572 1321947.493 1321123.411 1322285.269 998562.271
521652.51 521839.781 521527.755 521533.469 521474.133 521951.733 684769.25
3387607.073 387741.123 3388039.897 3387771.44 3387895.981 3387741.382 4096289.429
769880.265 769967.972 769761.725 769759.905 769704.059 770041.655 820051.104
7705.6621 7707.02628 7704.30263 7705.4813 7702.58011 7706.5819 6333.14295
3267.52306 3269.05545 3263.87805 3263.77097 3263.80128 3267.31957 4925.84576
34584.72 34584.692 34584.839 34584.773 34584.85 34584.691 38973.571
6.178 6.178 6.178 6.178 6.178 6.178 5.695
2.325 2.325 2.327 2.326 2.327 2.326 2.149
1233.74 1233.74 1233.74 1233.74 1233.74 1233.74 1233.74
229.059 229.069 229.082 229.071 229.077 229.076 216.205
107.817 107.822 107.821 107.814 107.819 107.814 101.169
1107.247 1107.244 1107.253 1107.256 1107.254 1107.251 15.689
79.503 79.503 79.501 79.504 79.502 79.5 10.741
1027.744 1027.741 1027.751 1027.752 1027.752 1027.751 4.948
4.874 4.873 4.874 4.875 4.875 4.874 -21.402
1824.79 1824.79 1824.791 1824.791 1824.791 1824.79 836.677
228.517 228.517 228.517 228.517 228.517 228.517 151.078
0.186 0.186 0.186 0.186 0.186 0.186 0.175
0.897 0.897 0.897 0.897 0.897 0.897 0.013
0.471 0.471 0.471 0.471 0.471 0.471 0.468
0.072 0.072 0.072 0.072 0.072 0.072 0.685
Percentage discrepancy (from 32-bit gfortran results) for the 26 key variables:
gfortran g77 pgi 1 pgi 2 pgi 3 pgi 4&5 pgi 6&7
0.00 -0.00 0.11 0.08 0.14 -0.05 -48.06
0.00 0.00 0.02 0.01 0.03 0.00 -11.92
0.00 0.00 0.02 0.01 0.02 0.00 -11.89
0.00 0.02 0.04 0.03 0.01 0.01 -30.52
0.00 0.03 -0.03 -0.00 -0.07 0.02 -24.47
0.00 0.04 -0.02 -0.00 -0.03 0.06 31.27
0.00 0.00 0.01 0.00 0.01 0.00 20.92
0.00 0.01 -0.02 -0.00 -0.02 0.02 6.52
0.00 0.02 -0.02 0.00 -0.04 0.01 -17.81
0.00 0.05 -0.11 -0.10 -0.11 -0.01 50.75
0.00 0.00 0.00 0.00 0.00 0.00 12.69
0.00 0.00 0.00 0.00 0.00 0.00 -7.82
0.00 0.00 0.09 0.04 0.09 0.04 -7.57
0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.01 0.01 0.01 0.01 -5.61
0.00 0.00 0.00 0.00 0.00 0.00 -6.17
0.00 0.00 0.00 0.00 0.00 0.00 -98.58
0.00 0.00 0.00 0.00 0.00 0.00 -86.49
0.00 0.00 0.00 0.00 0.00 0.00 -99.52
0.00 -0.00 0.00 0.02 0.02 0.00 -539.11
0.00 0.00 0.00 0.00 0.00 0.00 -54.15
0.00 0.00 0.00 0.00 0.00 0.00 -33.89
0.00 0.00 0.00 0.00 0.00 0.00 -5.91
0.00 0.00 0.00 0.00 0.00 0.00 -98.55
0.00 0.00 0.00 0.00 0.00 0.00 -0.64
0.00 0.00 0.00 0.00 0.00 0.00 851.39
===
I’m slightly confused by the huge discrepancy in the results for the runs pgi 6 & 7. Does this represent a bug, or could such a huge difference result from legitimately different floating point implementations?
Thanks.