Build/test ATLAS and LAPACK with PGI

Hi,

I’m trying to build ATLAS 3.8.3 + netlib LAPACK 3.3.0 using the PGI 10.0 compilers.

The recipe on the PGI website does a good job at describing how to perform a build and run the ATLAS tests, but it doesn’t cover running the LAPACK tests.

Does anyone have a good set of PGI compiler flags to improve numerical accuracy, please? I’ve tried:

-O2
-O1
-O2 -Kieee

Each time, my ATLAS/LAPACK build fails a rather large number of LAPACK tests.

Cheers,

Mark

Hi Mark,

The “-Kieee” flag instructs the compiler to adhere to IEEE 754 so I’m not sure why you are seeing failures. I’ll try and recreate the errors to better understand the issue.

Thanks,
Mat

Hi Mat,

Hang on: I’ve been talking to the LAPACK developers. Seems that I’m not interpreting the test output files properly.

Will recheck and get back if there’s still a problem.

Thanks,

Mark

Hi,

OK, I think someone does need to look at PGI Fortran and the LAPACK 3.3.0 test suite after all. At present, validating a full ATLAS build is not a definite success.

After talking to the LAPACK support team, it seems that the LAPACK test suite is more of an implementation torture test, rather than a validation that you have built LAPACK successfully. Some number of failures is therefore expected. Basically, I am told that, if you grep the output files of the test suite for “out of”, a successful build should expect between 200-500 failed tests. If there is a serious problem, there will be >1000 failed tests.

When I run the tests against the 32-bit PGI Linux compiler (10.0 and 11.3), with everything compiled with “-O0 -Kieee”, I get 267 numerical failures. This is good and is around what I get with both the GNU and Intel compilers.

When I do the same, but instead against the 64-bit PGI Linux compiler (10.0 and 11.3), I get 850 numerical errors. This is not so good.

I reported this to the LAPACK team, but have so far not had a response. From previous conversations with them, they do not seem to be regularly testing against PGI. The 850 errors may therefore be a problem.

To recap, I am:

  • On a 64-bit Linux RHEL 5.5 system with Intel Nehalem processors.
  • Compiling with either 64-bit PGI 10.0 or 11.3 Fortran compilers (same result with either)
  • Building the netlib reference BLAS with “-O0 -Kieee”
  • Checking that the reference BLAS passes the BLAS test suite
  • Building the netlib LAPACK 3.3.0 with “-O0 -Kieee”
  • Checking that the result passes the LAPACK 3.3.0 test suite.

I am doing this in order to validate a LAPACK library, so that I can use it to do a full BLAS/LAPACK build of ATLAS.

Have other people seen the same thing?

Thanks,

Mark

Hi Mark,

I’ll take a look but I’m not yet convinced that this is a compiler problem. Given that no optimization is performed by the compiler and that ATLAS self-tunes, I’m guess that it’s a problem with ATLAS, or at least with how ATLAS is tuning itself in 64-bits with PGI.

Have you tried the same processes with other libraries such as GOTOBlas, ACML, MKL, etc? What happens when you use optimization? (such as “-fast -Kieee”)

Note that we do daily runs on the LAPACK tests using about 100 flag sets without issues. (The one exception is the sed test with auto-parallelization will fail). Granted this is using the NetLIB source, not ATLAS. For ATLAS we just use the tests that come with the library.

  • Mat

Hi Mat,

I feel much better hearing that you do daily builds yourself against the Netlib LAPACK.

Although I started this thread specificically about ATLAS, in my last post I reverted to using the Netlib BLAS/LAPACK in an attempt to simplify things a little when running the Netlib tests. So ATLAS at this stage is not involved - sorry I wasn’t clear.

The issue I am raising now is that, according to the rules of thumb given to me by the LAPACK people, 64-bit PGI with the Netlib BLAS and LAPACK is failing ~600 more Netlib LAPACK tests than 32-bit pgi, or GCC, or the intel compiler.

If you can tell me that this is known about, is expected, and in reality everything is ok, then I’ll go away happy :) Is it?

Thanks,

Mark

Hi Mark,

I just looked at our 11.4 QA Lapack tests but did not see any major differences between 32 and 64-bit results. There are errors in the gd tests (261 out of 20000) but I believe these are ok. The only difference between 32 and 64-bit is the ssep test which has 3 of out 14256 tests failing in 64-bits at low optimization (-O1,-O2) and has been reported as TPR#17385.

So the 600 test difference could be a problem but unfortunately I’m not seeing it here. Can you let me know the specific tests where the failures occur?

Thanks,
Mat