Poor performance with libblas.a

Hi !

We have just received our CDK 6.1, and tried to run the HPLinkpack.
On a 8 processors system (4 dual core Xeon @2.8Ghz) we get:

  • 26 GFlops with GotoBLAS
  • 11 GFlops with Debian’s Sarge BLAS (64 bits release)
  • and less than 1 GFlop (?!) with PGI’s libblas.a

Do I miss something ? Optimized BLAS libs are not shipped with the CDK ?


Ludovic Drolez.

Hi Ludovic,

Try using ACML instead (libacml). libblas is simply a precompiled version of the BLAS library found on netlib.org and is included for legacy reasons. ACML is AMD’s optimized BLAS and LAPACK routines and should be much faster.

  • Mat

Please, can you post the results with acml? Thank you!

Ok, with acml and PGI’s mpich, I got 13 GFlops only.
And with acml and lam4, I got 18 GFlops.
So GOTOBlas wins, not surprising since libacml seems to be optimized for AMD processors not Intel ones…