Different answers with "-fast" and "-fastsse&

member106 · July 16, 2004, 8:02pm

I’ve compiled and run my code on a P4 running SuSE9.0 using both “-fast” and “-fastsse” but noticed that I get slightly different answers.

Example:

PROGRAM p
REAL8 res
INTERGER i
res = 1.0
DO i = 1, 200
res = res * 0.314
END DO
WRITE(,*) "Results: ", res, “\n”
END

pgf90 -fast a.f → Results: 2.418261068169633E-101
pgf90 -fastsse a.f → Results: 2.418261068169615E-101

Why the different values?

MatColgrove · July 22, 2004, 11:42pm

There are actually several differences between “-fast” and “-fastsse” that can result in different answers when running the same code. First off, both -fast and -fastsse are really a set of optimizations which generally give the best performance. -fast is “-O2 -Munroll=c:1 -Mnoframe -Mlre” and -fastsse is -fast plus “-Mscalarsse -Mvect=sse -Mcache_align -Mflushz”.

The biggest difference are the “-Mscalarsse -Mvect=sse” flags which tells the compiler to generate SSE code, while -fast will generate x87 code. SSE is generally faster since its architecture is faster and it can perform multiple floating point calculations per clock cycle. While it’s harder to generate optimized code for x87 and x87 only performs one 80-bit calculation per cycle.

One reason why your seeing precision differences is because for double precision floating point values, SSE uses a 64-bit register while x87 uses a 80-bit register. Although values are truncated to 64-bits when stored to memory, a good compiler will try and keep values in the x87 register. As more and more calculations are done, the more impact the extra bits make. Also, SSE code will use different algorithms which can result in slightly different results.

In the FAQ section there a more detailed guide on precision issuses on an x86 systems that you might want to read. (See /support/execute.htm#precision).

Mat

Topic		Replies	Views
Internal Compiler Error with -fastsse on Pentium 3 Legacy PGI Compilers (archived)	1	3100	December 3, 2008
what options does -fastsse use Legacy PGI Compilers (archived)	1	4103	May 4, 2007
SEGV and -fast optimization (f90) Legacy PGI Compilers (archived)	2	3675	December 8, 2009
Fortran "-fast" slower on newer version of the com Legacy PGI Compilers (archived)	1	2797	September 5, 2017
Problem with code migration Legacy PGI Compilers (archived)	1	10723	January 13, 2005
fail with linpack benchmark on p4 arch Legacy PGI Compilers (archived)	2	9838	March 8, 2005
pgf77 performance issue 6.0 vs 5.2 Legacy PGI Compilers (archived)	5	11823	December 12, 2005
Flags for AMD64 Legacy PGI Compilers (archived)	3	7145	October 24, 2005
CUDA Fortran : -fast changes result Legacy PGI Compilers (archived)	1	2189	January 29, 2010
Optimization issue Legacy PGI Compilers (archived)	1	1698	June 14, 2010

Different answers with "-fast" and "-fastsse&

Related topics