fail to converged when binary compiled by latest release

Hi,

while compiling molecule simulation code on my SGI itanium box, error encounter when running simple test case with the executable. The OS of SGI IA64 machine is Gentoo, with version of glibc 3.3.4. We’ve tried couple of combination as well as various of PGI compiler, and figure out if basis set including f channel, integration seems to be diverged somehow. Furthermore, convert to older version also not help solving the problem. Compiler option adopted on IA64 is: -O2 -Mextend

While turning off the optimization, program works as usual. It fails all the time when optimization is switch on (actually, job terminated normall, but the quantity is serveral times than the those obtained on IA32 arch):


if optimization is switch off on IA64 as well as switch on but compiled on IA32:

Dipole Moment (Debye)
X 2.0329 Y -0.5028 Z 0.0000
Tot 2.0942
Quadrupole Moments (Debye-Ang)
XX -10.8505 XY -0.9292 YY -11.6864
XZ 0.0000 YZ 0.0000 ZZ -10.7749
Octapole Moments (Debye-Ang^2)
XXX -0.6157 XXY -1.5426 XYY -0.9108

optimization is swtich on on IA64:

Dipole Moment (Debye)
X -23.5496 Y 2.8812 Z 0.0000
Tot 23.7252
Quadrupole Moments (Debye-Ang)
XX -66.0876 XY 12.3932 YY -42.7407
XZ 0.0000 YZ 0.0000 ZZ -33.2484
Octapole Moments (Debye-Ang^2)
XXX -116.5504 XXY 28.4596 XYY -31.6246



comparison of the makefile:

[jason@localhost AMD-fail]$ diff md.make.linux.opteron md.make.linux.opteron-fortran-null
181,182c181,182
< FOPTIMIZE = -O2
< #FOPTIMIZE =

#FOPTIMIZE = -O2
FOPTIMIZE =


any idea? Thanks in advance.

BR,
J

Hi J,


We don’t support IA64 (Itanium) so I’m bit supprised you we able to get anything to run on the SGI machine. Actually, I didn’t think IA64 allowed you to run IA32 binaries.

I’m wondering if you really mean AMD64 (Opteron) since you diff two files with opteron in their names.

  • Mat

Mat, sorry for the my mistake. It’s AMD Opteron intstead. The output carried out from Ia32 is compiled and executed on other machine. :-)

BR,
J

No problem. Let see if we can figure out why your getting different answers. My best guess is it the difference between how x87 and SSE calculate floating point values. x87 uses 80-bits of precision while SSE uses 64-bits. Although all double precision values are stored as 64-bits, at -O2 pgf90 will accumulate values in the x87 registers. So as more calculations are done, the more the extra bits matter.

To test this theory, try compiling and running with the following flags on your IA32 (Note I’m assuming your IA32 is a pentium 4 or equivlent), “-O2 -pc 64” and “-O2 -Mscalarsse”. Does the output now match the AMD64 bit machine?

AMD64 only has SSE and should match the two flags listed. “-O2 -pc 64” uses the x87 registers but forces the compiler to store the values to memory with each iteration. “-Mscalarsse” tells the compiler to use SSE instead of x87.

Of course the flaw here is that your answers are very different. You’d expect that the difference to be small since 64 to 80 bit precision only effects very small values. However, I’ve seen programs where such values are used as divisors and can cause greater deviation of the end results.

Let me know how this works!

  • Mat

Hi Mat,

sorry for the late, it tool couple of days to finish compiling all source as well as basis on another p4 machine. However, all test cases show the same result with two different compiler arguments (-O2 -pc 64 and -O2 -Mscalarsse) you suggested instead of unconverged problem encountered in SGI Opteron. Any furhter comment?

BR,
J

The next thing to examine is to see if your program needs to be ported to 64-bits. Another user gave a good explaination of some of the things you’ll need to do when porting from 32 to 64-bits. See:

Try running your P4 executable on the SGI. (If you haven’t already, I believe Gentoo requires you to install a 32-bit emualtion package before you can run 32-bit executables.) You can also try compiling with “-tp k8-32” to create a 32-bit executable, however this might not work since I’m not sure if Gentoo has 32-bit library support.

If the 32-bit executable runs correctly, then there is a good chance that your errors are due to a porting problem.

  • Mat