Hi !
We have just received our CDK 6.1, and tried to run the HPLinkpack.
On a 8 processors system (4 dual core Xeon @2.8Ghz) we get:
- 26 GFlops with GotoBLAS
- 11 GFlops with Debian’s Sarge BLAS (64 bits release)
- and less than 1 GFlop (?!) with PGI’s libblas.a
Do I miss something ? Optimized BLAS libs are not shipped with the CDK ?
Regards,
Ludovic Drolez.
Hi Ludovic,
Try using ACML instead (libacml). libblas is simply a precompiled version of the BLAS library found on netlib.org and is included for legacy reasons. ACML is AMD’s optimized BLAS and LAPACK routines and should be much faster.
Please, can you post the results with acml? Thank you!
Ok, with acml and PGI’s mpich, I got 13 GFlops only.
And with acml and lam4, I got 18 GFlops.
So GOTOBlas wins, not surprising since libacml seems to be optimized for AMD processors not Intel ones…
Regards,
Ludovic.