LinPack HPL to benchmark NVIDIA GPUs

hsmak · May 6, 2011, 3:07pm

Hello everybody,

I’m trying to install HPL to benchmarck a NVIDIA GPU…

I managed to install the regular hpl-2.0 with the following dependencies:

[list=1]

[*]atlas3.8.3.tar.gz

[*]openmpi-1.4.3.tar.bz2

Now, I’m wondering how to make HPL benchmark my GPU. I found that I need to download the modified HPL version from NVIDIA.

Besides, I found many of those who managed to test their GPUs installed:

[list=1]

[*]intel MKL

[*]intel compiler

My question is:

Do I need that modified hpl version from NVIDIA?

Do I need to install intel MKL and intel compiler? will ATLAS work fine with testing my GPU?

Your help is very much appreciated!!

Thank you…

avidday · May 6, 2011, 5:04pm

Yes you need a modified version of HPL if you want to use CUDA capable GPUs with it. Neither Atlas not MKL have anything to do with using the GPU in HPL - that requires an additional GPU BLAS such as CUBLAS or MagmaBLAS. But even then, the HPL source requires considerable modification to use the GPU.

hsmak · May 6, 2011, 5:16pm

@avidday

Thank you very much for your reply.

Could you please tell me where I can get the modified version from?? Is it available to download by any one??

I spent too much time trying to find it but with no avail!!

avidday · May 6, 2011, 5:48pm

To the best of my knowledge, there is no such version currently available for public download.

hsmak · May 6, 2011, 5:52pm

So, I understand that NVIDIA must be contacted!!!

fcs · May 9, 2011, 7:11am

In the pdf of M. Fatica and E. Phillips “Cuda Accelerated Linpack On GPU” at GTC 2010 (hold september, 21st 2010) the last slide says “Code is available from NVIDIA”…

But i never succeeded in getting it.
HPL_cuda2057_GTC2010.pdf (2.09 MB)

avidday · May 10, 2011, 3:42am

You might find some code at git://github.com/avidday/hpl-cuda.git which could be of some use.

hsmak · May 10, 2011, 4:16am

@avidday

I’ll give it a try and will see how things will go…

Thank you so much for for your help.

dmyablonski · June 21, 2011, 6:55pm

Did you get anywhere on this? Also looking for a CUDA (or OpenCL) linpack benchmark implementation.

Raman · June 29, 2011, 5:13am

Hi All , this is my first post

I have a Nvidia GeForce GTX 460 GPU
and i want to benchMark it using HPL

but i am not getting HPL_CUDA anywhere.

If anybody has any idea of where i can get the HPL for CuDA
then please let me know.

Thanks in Advance.

Raman…

avidday · June 29, 2011, 7:22am

There is a link to a git repository containing a simplified version of my HPL port here git://github.com/avidday/hpl-cuda.git

Raman · June 29, 2011, 7:41am

Thanks avidday, for your reply
but i am not able to open that url
because it is using git protocol , which firefox doesnt know.

So could you please tell me if there is some http link for
hpl_cuda.

otherwise please tell how do i get hpl_cuda from this repository

Regards,

tera · June 29, 2011, 7:52am

GitHub - avidday/hpl-cuda: simple port of hpl-2.0 to use NVIDIA GPU accelation with CUBLAS ?

Raman · June 29, 2011, 8:18am

Thank you very much Tera and avidday.

I will to execute it and check the performance of my GPU

Regards

avidday · June 29, 2011, 8:30am

It isn’t a url you open in a browser, it is a git repository address. Pull the tree using git and you have the codebase. Build it and you have a working hpl-cuda implementation.

DrMikeT · July 14, 2011, 6:09pm

Hello, is this an older version of the actual code that Nvidia plans to release eventually for hybrid CPU/GPU HPL? Or this is just your personal effort to do this?

Is the above code robust to be a good starting point or should people start from scratch introducing GPUs to HPL?

thanks --Michael

avidday · July 14, 2011, 6:20pm

That code was not developed by NVIDIA, it is an independent effort. The code is a robust adaptation of the standard HPL 2.0 code base which should be fine for use on platforms where 1 MPI process per GPU makes sense. It will require tuning for whatever host CPU/GPU hardware and host BLAS combination is used.

DrMikeT · July 18, 2011, 5:46pm

Thanks … I have downloaded it and I’ll give it a look.

michael

karanchhabra2013 · March 8, 2018, 6:17pm

Hi,

I’ve been trying to benchmark a system with NVIDIA Tesla K40c (driver version: 390.30 + cuda version 9.1).

Both nvidia driver and cuda are working fine.

/usr/local/cuda-9.1/bin/nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

nvidia-smi

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I’ve installed:

OpenBLAS
mpich
hpl-2.0_FERMI_v15

i’ve already got compiled binaries for mpich and openBLAS and also got binaries and libraries for CUDA as well.

On trying to compile HPL with the following Make.CUDA file, i’m getting error of unavailability of mpi.h file:

make[1]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest' make -f Make.top build_src arch=CUDA make[1]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest’
( cd src/auxil/CUDA; make TOPdir=/root/hpl/hpl-2.0_FERMI_v15_latest )
make[2]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA' /root/hpl/mpich/bin/mpicc -o HPL_dlacpy.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -DCUDA -I/root/hpl/hpl-2.0_FERMI_v15_latest/include -I/root/hpl/hpl-2.0_FERMI_v15_latest/include/CUDA -I-I/root/hpl/mpich/include64 -I/usr/local/cuda-9.1/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp ../HPL_dlacpy.c In file included from /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl.h:80:0, from ../HPL_dlacpy.c:50: /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl_pmisc.h:54:17: fatal error: mpi.h: No such file or directory #include "mpi.h" ^ compilation terminated. make[2]: *** [HPL_dlacpy.o] Error 1 make[2]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA’
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/root/hpl/hpl-2.0_FERMI_v15_latest’
make: *** [build] Error 2

Make.CUDA file

- shell --------------------------------------------------------------

----------------------------------------------------------------------

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

ARCH = CUDA

Set TOPdir to the location of where this is being built

ifndef TOPdir
TOPdir =/root/hpl/hpl-2.0_FERMI_v15_latest
endif
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)

HPLlib = $(LIBdir)/libhpl.a

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir = /root/hpl/mpich
MPinc = -I$(MPdir)/include64
#MPlib = $(MPdir)/lib64/libmpi.a
MPlib = $(MPdir)/lib/libmpich.so

----------------------------------------------------------------------

- Linear Algebra library (BLAS) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

#LAdir = $(TOPdir)/../../lib/em64t
LAdir = /root/hpl/openblas
LAinc =

CUDA

#LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread
LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda-9.1/lib64 -lcuda -lcudart -lcublas -L$(LAdir)/libopenblas.so

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -I/usr/local/cuda-9.1/include
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

HPL_OPTS = -DCUDA

----------------------------------------------------------------------

HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

next two lines for GNU Compilers:

CC = /root/hpl/mpich/bin/mpicc
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp

next two lines for Intel Compilers:

CC = mpicc

CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp

CCNOOPT = $(HPL_DEFS) -O0 -w

On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

#LINKER = mpif77
LINKER = /root/hpl/mpich/bin/mpif77
#LINKFLAGS = $(CCFLAGS) -static_mpi
#LINKFLAGS = $(CCFLAGS)

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

----------------------------------------------------------------------

MAKE = make TOPdir=$(TOPdir)

Some suggesstions which I got were:

install mpich-devel package (done and can see /usr/include/mpi.h file on my local file system but still the error persists, also tried copying the entire include folder into cpmpile mpich folder under hpl folder but still the issue remains)

Can someone help me with this? I’ve been trying to debug this for long.

Thanks,
Karan

Topic		Replies	Views
HPL CUDA Programming and Performance	11	42585	July 18, 2011
HPL on Kepler GPUs CUDA Programming and Performance	3	5148	March 12, 2018
where to find the hpl 2.0 for CUDA CUDA Programming and Performance	0	1050	March 21, 2011
HPLinpack for CUDA Any interest? CUDA Programming and Performance	27	12239	May 10, 2012
Installation of Linpack for Fermi CUDA Programming and Performance	2	26087	March 8, 2018
Linpack installing problem Problem installing linpack with cublas support CUDA Programming and Performance	16	7988	December 7, 2009
Compiling HPL for CUDA CUDA Programming and Performance	3	7968	March 8, 2018
CUDA GPU enabled HPL HPL + CUDA + Fermi CUDA Programming and Performance	0	973	July 14, 2011
Run HPL benckmark 23.3 on A800(80GB) GPU-Accelerated Libraries cuda	0	1251	April 20, 2023
HPL cuda accelerated binaries for Tesla P100 GPU-Accelerated Libraries	0	629	February 14, 2019

LinPack HPL to benchmark NVIDIA GPUs

/usr/local/cuda-9.1/bin/nvcc -V

nvidia-smi

- shell --------------------------------------------------------------

----------------------------------------------------------------------

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

Set TOPdir to the location of where this is being built

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

----------------------------------------------------------------------

- Linear Algebra library (BLAS) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

CUDA

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

----------------------------------------------------------------------

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

next two lines for GNU Compilers:

next two lines for Intel Compilers:

CC = mpicc

CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp

On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

----------------------------------------------------------------------

Related topics