Linpack installing problem Problem installing linpack with cublas support

When i am starting installing linpack i have such params:

but when i am starting the installation i have such errors:

What am i doing wrong?

Trying to use CUBLAS like a drop in replacement for the standard fortran BLAS. It isn’t. You won’t be able to just build LINPACK using CUBLAS. There is a manual for CUBLAS supplied with the tool kit. You probably ought to read it. You will see several things - amongst them

  1. CUBLAS function names don’t follow the same naming conventions as BLAS

  2. CUBLAS functions require additional support function calls to manage memory on the GPU and copy data to and from the GPU (which any code expecting a standard BLAS will not contain)

3,. CUBLAS is a relatively limited subset of a complete BLAS and many functions are not implementedm although this is rumoured to be considerably improved in the current CUDA 3.0 beta.

Thanks for replying.

I`ve read the manual for CUBLAS. The functions are really different. So, I mast write my own wrapper for CUBLAS functions for linpack, am I right?

Because you have read the manual for CUBLAS, you will be aware that there is already a form of wrapper interface available (referred to as the thunking interface). You will also be aware that NVIDIA don’t recommend using it, or that sort of direct wrapping approach, because it is very slow and severely hamstrung by the bandwdith and latency of the PCI express bus.

I`ve reread the manual for CUBLAS and made some changes in my makefile:

Also I`ve defined CUBLAS_USE_THUNKING.

But the compiler says me:

The compiler version is 4.1.2

As the error message says, you are trying to compile C code with a fortran compiler…

As an aside, how many gpus do you have? You do realize that unless you have at least 4 you are wasting your time, because HPL requires a minimum of 4 MPI processes, and each requires its own GPU?

I tried to use gcc and i says that message, too.

I have 4 GPUs. So the idea of installing linpack is very realistic for me :)

I wonder, is there any manual for installing linpack with cublas. I have read alot of papers that say about performance of nvidia video accelerators, but no one did not write how can we install linpack step-by-step. As it is done on amd site with firestream and amlg.

No there isn’t. HPL requires fairly significant modifications to be used with CUBLAS. You can’t just “compile and run”, if that is what you were hoping for.

What fairly significant modifications mast I do to make linpack work with CUBLAS?

Right now that is an excercise left to the reader. One approach is to build HPL with modified versions of the supplied C blas functions (HPL_sgemm etc) which call CUBLAS functions and/or host BLAS functions depending size metrics.

Thats a good idea to modify the versions of cblas functions. There is the file in HPL called hpl_blas.h, where the prototypes of blas functions are defined. Ive tried to substitute the cublas functions instead of cblas functions. Then HPL was compiled without warnings and errors, but the runtime errors appeared and application crashed.

Today i`ll compare the signature of functions from cblas and cublas ones, then modify them in hpl. I hope that it will work.

P.S. I wonder, why there are no related topics on the forum.

How are you managing GPU memory transfers? You really only need to worry about DGEMM to start with, that is where the largest performance improvements can be had.

There are a couple, but it seems that for most people LINPACK isn’t all that interesting (plus requiring 4 CUDA devices is a large barrier to entry).

You can run Linpack on a single GPU, no need for 4 devices.
You need to offload DGEMM calls that are big enough to keep the GPU busy and to amortize the data transfer.
Replacing all the DGEMM calls with CUBLAS calls, it is not a good idea.

Forgiven my skepticism, but how can you run HPL with only one GPU?

You can run HPL with a single MPI process, just set P=Q=1 in the HPL.dat file.

That`s right, i used this to test the AMD GPUS performance.

As for Linpack and CUDA. Is there any installation guide were it is written what I must correct in linpack to use cublas?

So you can! Maybe I got that notion into my head with a vendor supplied version that wouldn’t run with less than 4 MPI process or something… I must admit I only ever ran it on many more nodes than that.