Build/test ATLAS and LAPACK with PGI

MarkD99 · March 21, 2011, 3:35pm

Hi,

I’m trying to build ATLAS 3.8.3 + netlib LAPACK 3.3.0 using the PGI 10.0 compilers.

The recipe on the PGI website does a good job at describing how to perform a build and run the ATLAS tests, but it doesn’t cover running the LAPACK tests.

Does anyone have a good set of PGI compiler flags to improve numerical accuracy, please? I’ve tried:

-O2
-O1
-O2 -Kieee

Each time, my ATLAS/LAPACK build fails a rather large number of LAPACK tests.

Cheers,

Mark

MatColgrove · March 22, 2011, 12:20am

Hi Mark,

The “-Kieee” flag instructs the compiler to adhere to IEEE 754 so I’m not sure why you are seeing failures. I’ll try and recreate the errors to better understand the issue.

Thanks,
Mat

MarkD99 · March 23, 2011, 4:26pm

Hi Mat,

Hang on: I’ve been talking to the LAPACK developers. Seems that I’m not interpreting the test output files properly.

Will recheck and get back if there’s still a problem.

Thanks,

Mark

MarkD99 · April 8, 2011, 9:52am

Hi,

OK, I think someone does need to look at PGI Fortran and the LAPACK 3.3.0 test suite after all. At present, validating a full ATLAS build is not a definite success.

After talking to the LAPACK support team, it seems that the LAPACK test suite is more of an implementation torture test, rather than a validation that you have built LAPACK successfully. Some number of failures is therefore expected. Basically, I am told that, if you grep the output files of the test suite for “out of”, a successful build should expect between 200-500 failed tests. If there is a serious problem, there will be >1000 failed tests.

When I run the tests against the 32-bit PGI Linux compiler (10.0 and 11.3), with everything compiled with “-O0 -Kieee”, I get 267 numerical failures. This is good and is around what I get with both the GNU and Intel compilers.

When I do the same, but instead against the 64-bit PGI Linux compiler (10.0 and 11.3), I get 850 numerical errors. This is not so good.

I reported this to the LAPACK team, but have so far not had a response. From previous conversations with them, they do not seem to be regularly testing against PGI. The 850 errors may therefore be a problem.

To recap, I am:

On a 64-bit Linux RHEL 5.5 system with Intel Nehalem processors.
Compiling with either 64-bit PGI 10.0 or 11.3 Fortran compilers (same result with either)
Building the netlib reference BLAS with “-O0 -Kieee”
Checking that the reference BLAS passes the BLAS test suite
Building the netlib LAPACK 3.3.0 with “-O0 -Kieee”
Checking that the result passes the LAPACK 3.3.0 test suite.

I am doing this in order to validate a LAPACK library, so that I can use it to do a full BLAS/LAPACK build of ATLAS.

Have other people seen the same thing?

Thanks,

Mark

MatColgrove · April 8, 2011, 5:16pm

Hi Mark,

I’ll take a look but I’m not yet convinced that this is a compiler problem. Given that no optimization is performed by the compiler and that ATLAS self-tunes, I’m guess that it’s a problem with ATLAS, or at least with how ATLAS is tuning itself in 64-bits with PGI.

Have you tried the same processes with other libraries such as GOTOBlas, ACML, MKL, etc? What happens when you use optimization? (such as “-fast -Kieee”)

Note that we do daily runs on the LAPACK tests using about 100 flag sets without issues. (The one exception is the sed test with auto-parallelization will fail). Granted this is using the NetLIB source, not ATLAS. For ATLAS we just use the tests that come with the library.

Mat

MarkD99 · April 13, 2011, 1:10pm

Hi Mat,

I feel much better hearing that you do daily builds yourself against the Netlib LAPACK.

Although I started this thread specificically about ATLAS, in my last post I reverted to using the Netlib BLAS/LAPACK in an attempt to simplify things a little when running the Netlib tests. So ATLAS at this stage is not involved - sorry I wasn’t clear.

The issue I am raising now is that, according to the rules of thumb given to me by the LAPACK people, 64-bit PGI with the Netlib BLAS and LAPACK is failing ~600 more Netlib LAPACK tests than 32-bit pgi, or GCC, or the intel compiler.

If you can tell me that this is known about, is expected, and in reality everything is ok, then I’ll go away happy :) Is it?

Thanks,

Mark

MatColgrove · April 13, 2011, 10:12pm

Hi Mark,

I just looked at our 11.4 QA Lapack tests but did not see any major differences between 32 and 64-bit results. There are errors in the gd tests (261 out of 20000) but I believe these are ok. The only difference between 32 and 64-bit is the ssep test which has 3 of out 14256 tests failing in 64-bits at low optimization (-O1,-O2) and has been reported as TPR#17385.

So the 600 test difference could be a problem but unfortunately I’m not seeing it here. Can you let me know the specific tests where the failures occur?

Thanks,
Mat

Topic		Replies	Views
error encounter while building atlas libraries Legacy PGI Compilers	3	17267	September 14, 2004
ATLAS compile error with PGI suite 10.9 Legacy PGI Compilers	3	2940	March 2, 2011
fail to converged when binary compiled by latest release Legacy PGI Compilers	5	23221	October 25, 2004
Errors when building with PGI compiler Legacy PGI Compilers	10	15237	January 16, 2012
Unable to compile ATLAS 3.10.1 Legacy PGI Compilers	9	19599	March 11, 2013
atlas 3.8.2; pgi 7.1 Legacy PGI Compilers	3	4432	April 8, 2008
elementary problem with linker - undefined references Legacy PGI Compilers	8	6742	August 23, 2010
different result with intel and pgi workstation Legacy PGI Compilers	6	4016	November 22, 2010
Problem with lapack Legacy PGI Compilers	3	12228	March 2, 2005
Gaussian 03 compilation PGI 8.04 Legacy PGI Compilers	5	11578	May 18, 2009

Build/test ATLAS and LAPACK with PGI

Related topics