LinPack HPL to benchmark NVIDIA GPUs

Hello everybody,

I’m trying to install HPL to benchmarck a NVIDIA GPU…

I managed to install the regular hpl-2.0 with the following dependencies:




Now, I’m wondering how to make HPL benchmark my GPU. I found that I need to download the modified HPL version from NVIDIA.

Besides, I found many of those who managed to test their GPUs installed:


intel MKL

intel compiler

My question is:

Do I need that modified hpl version from NVIDIA?

Do I need to install intel MKL and intel compiler? will ATLAS work fine with testing my GPU?

Your help is very much appreciated!!

Thank you…

Yes you need a modified version of HPL if you want to use CUDA capable GPUs with it. Neither Atlas not MKL have anything to do with using the GPU in HPL - that requires an additional GPU BLAS such as CUBLAS or MagmaBLAS. But even then, the HPL source requires considerable modification to use the GPU.


Thank you very much for your reply.

Could you please tell me where I can get the modified version from?? Is it available to download by any one??

I spent too much time trying to find it but with no avail!!

To the best of my knowledge, there is no such version currently available for public download.

So, I understand that NVIDIA must be contacted!!!

In the pdf of M. Fatica and E. Phillips “Cuda Accelerated Linpack On GPU” at GTC 2010 (hold september, 21st 2010) the last slide says “Code is available from NVIDIA”…

But i never succeeded in getting it.
HPL_cuda2057_GTC2010.pdf (2.09 MB)

You might find some code at git:// which could be of some use.


I’ll give it a try and will see how things will go…

Thank you so much for for your help.

Did you get anywhere on this? Also looking for a CUDA (or OpenCL) linpack benchmark implementation.

Hi All , this is my first post

I have a Nvidia GeForce GTX 460 GPU
and i want to benchMark it using HPL

but i am not getting HPL_CUDA anywhere.

If anybody has any idea of where i can get the HPL for CuDA
then please let me know.

Thanks in Advance.


There is a link to a git repository containing a simplified version of my HPL port here git://

Thanks avidday, for your reply
but i am not able to open that url
because it is using git protocol , which firefox doesnt know.

So could you please tell me if there is some http link for

otherwise please tell how do i get hpl_cuda from this repository

Regards, ?

Thank you very much Tera and avidday.

I will to execute it and check the performance of my GPU


It isn’t a url you open in a browser, it is a git repository address. Pull the tree using git and you have the codebase. Build it and you have a working hpl-cuda implementation.

Hello, is this an older version of the actual code that Nvidia plans to release eventually for hybrid CPU/GPU HPL? Or this is just your personal effort to do this?

Is the above code robust to be a good starting point or should people start from scratch introducing GPUs to HPL?

thanks --Michael

That code was not developed by NVIDIA, it is an independent effort. The code is a robust adaptation of the standard HPL 2.0 code base which should be fine for use on platforms where 1 MPI process per GPU makes sense. It will require tuning for whatever host CPU/GPU hardware and host BLAS combination is used.

Thanks … I have downloaded it and I’ll give it a look.



I’ve been trying to benchmark a system with NVIDIA Tesla K40c (driver version: 390.30 + cuda version 9.1).

Both nvidia driver and cuda are working fine.

/usr/local/cuda-9.1/bin/nvcc -V

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85


Thu Mar 8 12:36:43 2018
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla K40c Off | 00000000:C4:00.0 Off | 0 |
| 23% 43C P0 66W / 235W | 0MiB / 11441MiB | 64% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| No running processes found |

I’ve installed:

  1. OpenBLAS
  2. mpich
  3. hpl-2.0_FERMI_v15

i’ve already got compiled binaries for mpich and openBLAS and also got binaries and libraries for CUDA as well.

On trying to compile HPL with the following Make.CUDA file, i’m getting error of unavailability of mpi.h file:

make[1]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest' make -f build_src arch=CUDA make[1]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest’
( cd src/auxil/CUDA; make TOPdir=/root/hpl/hpl-2.0_FERMI_v15_latest )
make[2]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA' /root/hpl/mpich/bin/mpicc -o HPL_dlacpy.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -DCUDA -I/root/hpl/hpl-2.0_FERMI_v15_latest/include -I/root/hpl/hpl-2.0_FERMI_v15_latest/include/CUDA -I-I/root/hpl/mpich/include64 -I/usr/local/cuda-9.1/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp ../HPL_dlacpy.c In file included from /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl.h:80:0, from ../HPL_dlacpy.c:50: /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl_pmisc.h:54:17: fatal error: mpi.h: No such file or directory #include "mpi.h" ^ compilation terminated. make[2]: *** [HPL_dlacpy.o] Error 1 make[2]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA’
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/root/hpl/hpl-2.0_FERMI_v15_latest’
make: *** [build] Error 2

Make.CUDA file

- shell --------------------------------------------------------------


SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch


- Platform identifier ------------------------------------------------



Set TOPdir to the location of where this is being built

ifndef TOPdir
TOPdir =/root/hpl/hpl-2.0_FERMI_v15_latest
INCdir = (TOPdir)/include BINdir = (TOPdir)/bin/(ARCH) LIBdir = (TOPdir)/lib/$(ARCH)

HPLlib = $(LIBdir)/libhpl.a


- Message Passing library (MPI) --------------------------------------


MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir = /root/hpl/mpich
MPinc = -I$(MPdir)/include64
#MPlib = (MPdir)/lib64/libmpi.a MPlib = (MPdir)/lib/


- Linear Algebra library (BLAS) -----------------------------


LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

#LAdir = $(TOPdir)/…/…/lib/em64t
LAdir = /root/hpl/openblas
LAinc =


#LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread
LAlib = -L (TOPdir)/src/cuda -ldgemm -L/usr/local/cuda-9.1/lib64 -lcuda -lcudart -lcublas -L(LAdir)/


- F77 / C interface --------------------------------------------------


F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle


- HPL includes / libraries / specifics -------------------------------


HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/(ARCH) (LAinc) (MPinc) -I/usr/local/cuda-9.1/include HPL_LIBS = (HPLlib) (LAlib) (MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.





- Compilers / linkers - Optimization flags ---------------------------


next two lines for GNU Compilers:

CC = /root/hpl/mpich/bin/mpicc
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp

next two lines for Intel Compilers:

CC = mpicc

CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp


On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

#LINKER = mpif77
LINKER = /root/hpl/mpich/bin/mpif77

RANLIB = echo


MAKE = make TOPdir=$(TOPdir)

Some suggesstions which I got were:

  1. install mpich-devel package (done and can see /usr/include/mpi.h file on my local file system but still the error persists, also tried copying the entire include folder into cpmpile mpich folder under hpl folder but still the issue remains)

Can someone help me with this? I’ve been trying to debug this for long.