fail to converged when binary compiled by latest release

jasonshih · October 18, 2004, 5:44pm

Hi,

while compiling molecule simulation code on my SGI itanium box, error encounter when running simple test case with the executable. The OS of SGI IA64 machine is Gentoo, with version of glibc 3.3.4. We’ve tried couple of combination as well as various of PGI compiler, and figure out if basis set including f channel, integration seems to be diverged somehow. Furthermore, convert to older version also not help solving the problem. Compiler option adopted on IA64 is: -O2 -Mextend

While turning off the optimization, program works as usual. It fails all the time when optimization is switch on (actually, job terminated normall, but the quantity is serveral times than the those obtained on IA32 arch):

if optimization is switch off on IA64 as well as switch on but compiled on IA32:

Dipole Moment (Debye)
X 2.0329 Y -0.5028 Z 0.0000
Tot 2.0942
Quadrupole Moments (Debye-Ang)
XX -10.8505 XY -0.9292 YY -11.6864
XZ 0.0000 YZ 0.0000 ZZ -10.7749
Octapole Moments (Debye-Ang^2)
XXX -0.6157 XXY -1.5426 XYY -0.9108
…

optimization is swtich on on IA64:

Dipole Moment (Debye)
X -23.5496 Y 2.8812 Z 0.0000
Tot 23.7252
Quadrupole Moments (Debye-Ang)
XX -66.0876 XY 12.3932 YY -42.7407
XZ 0.0000 YZ 0.0000 ZZ -33.2484
Octapole Moments (Debye-Ang^2)
XXX -116.5504 XXY 28.4596 XYY -31.6246
…

comparison of the makefile:

[jason@localhost AMD-fail]$ diff md.make.linux.opteron md.make.linux.opteron-fortran-null
181,182c181,182
< FOPTIMIZE = -O2
< #FOPTIMIZE =

#FOPTIMIZE = -O2
FOPTIMIZE =

any idea? Thanks in advance.

BR,
J

MatColgrove · October 18, 2004, 6:29pm

Hi J,

We don’t support IA64 (Itanium) so I’m bit supprised you we able to get anything to run on the SGI machine. Actually, I didn’t think IA64 allowed you to run IA32 binaries.

I’m wondering if you really mean AMD64 (Opteron) since you diff two files with opteron in their names.

Mat

jasonshih · October 19, 2004, 12:20am

Mat, sorry for the my mistake. It’s AMD Opteron intstead. The output carried out from Ia32 is compiled and executed on other machine. :-)

BR,
J

MatColgrove · October 19, 2004, 3:10am

No problem. Let see if we can figure out why your getting different answers. My best guess is it the difference between how x87 and SSE calculate floating point values. x87 uses 80-bits of precision while SSE uses 64-bits. Although all double precision values are stored as 64-bits, at -O2 pgf90 will accumulate values in the x87 registers. So as more calculations are done, the more the extra bits matter.

To test this theory, try compiling and running with the following flags on your IA32 (Note I’m assuming your IA32 is a pentium 4 or equivlent), “-O2 -pc 64” and “-O2 -Mscalarsse”. Does the output now match the AMD64 bit machine?

AMD64 only has SSE and should match the two flags listed. “-O2 -pc 64” uses the x87 registers but forces the compiler to store the values to memory with each iteration. “-Mscalarsse” tells the compiler to use SSE instead of x87.

Of course the flaw here is that your answers are very different. You’d expect that the difference to be small since 64 to 80 bit precision only effects very small values. However, I’ve seen programs where such values are used as divisors and can cause greater deviation of the end results.

Let me know how this works!

Mat

jasonshih · October 24, 2004, 4:19pm

Hi Mat,

sorry for the late, it tool couple of days to finish compiling all source as well as basis on another p4 machine. However, all test cases show the same result with two different compiler arguments (-O2 -pc 64 and -O2 -Mscalarsse) you suggested instead of unconverged problem encountered in SGI Opteron. Any furhter comment?

BR,
J

MatColgrove · October 25, 2004, 2:54pm

The next thing to examine is to see if your program needs to be ported to 64-bits. Another user gave a good explaination of some of the things you’ll need to do when porting from 32 to 64-bits. See:

Try running your P4 executable on the SGI. (If you haven’t already, I believe Gentoo requires you to install a 32-bit emualtion package before you can run 32-bit executables.) You can also try compiling with “-tp k8-32” to create a 32-bit executable, however this might not work since I’m not sure if Gentoo has 32-bit library support.

If the 32-bit executable runs correctly, then there is a good chance that your errors are due to a porting problem.

Mat

Topic		Replies	Views
Odd error maybe due to numerical resolution? Legacy PGI Compilers	3	2426	February 1, 2011
Errors when building with PGI compiler Legacy PGI Compilers	10	15237	January 16, 2012
Different answers Legacy PGI Compilers	3	14656	February 17, 2005
MM5 performance on AMD64 Opteron Legacy PGI Compilers	15	21072	February 27, 2006
32-bit vs 64-bit, compile & link problems Legacy PGI Compilers	2	7652	June 2, 2005
different result with intel and pgi workstation Legacy PGI Compilers	6	4016	November 22, 2010
Problem with code migration Legacy PGI Compilers	1	10698	January 13, 2005
PGI 6.0 to 6.1: Mnoscalarsse and optimization Legacy PGI Compilers	1	4177	May 1, 2006
compiling for x86 on a x64 machine Legacy PGI Compilers	6	14065	May 24, 2007
precision in pgf90? Legacy PGI Compilers	4	6936	July 5, 2007

fail to converged when binary compiled by latest release

Dipole Moment (Debye) X 2.0329 Y -0.5028 Z 0.0000 Tot 2.0942 Quadrupole Moments (Debye-Ang) XX -10.8505 XY -0.9292 YY -11.6864 XZ 0.0000 YZ 0.0000 ZZ -10.7749 Octapole Moments (Debye-Ang^2) XXX -0.6157 XXY -1.5426 XYY -0.9108 …

optimization is swtich on on IA64:

Dipole Moment (Debye) X -23.5496 Y 2.8812 Z 0.0000 Tot 23.7252 Quadrupole Moments (Debye-Ang) XX -66.0876 XY 12.3932 YY -42.7407 XZ 0.0000 YZ 0.0000 ZZ -33.2484 Octapole Moments (Debye-Ang^2) XXX -116.5504 XXY 28.4596 XYY -31.6246 …

comparison of the makefile:

[jason@localhost AMD-fail]$ diff md.make.linux.opteron md.make.linux.opteron-fortran-null 181,182c181,182 < FOPTIMIZE = -O2 < #FOPTIMIZE =

Related topics

Dipole Moment (Debye)
X 2.0329 Y -0.5028 Z 0.0000
Tot 2.0942
Quadrupole Moments (Debye-Ang)
XX -10.8505 XY -0.9292 YY -11.6864
XZ 0.0000 YZ 0.0000 ZZ -10.7749
Octapole Moments (Debye-Ang^2)
XXX -0.6157 XXY -1.5426 XYY -0.9108
…

Dipole Moment (Debye)
X -23.5496 Y 2.8812 Z 0.0000
Tot 23.7252
Quadrupole Moments (Debye-Ang)
XX -66.0876 XY 12.3932 YY -42.7407
XZ 0.0000 YZ 0.0000 ZZ -33.2484
Octapole Moments (Debye-Ang^2)
XXX -116.5504 XXY 28.4596 XYY -31.6246
…

[jason@localhost AMD-fail]$ diff md.make.linux.opteron md.make.linux.opteron-fortran-null
181,182c181,182
< FOPTIMIZE = -O2
< #FOPTIMIZE =