Compiling HPL for CUDA

I’m trying to compile the High Performance Linpack benchmark (from NVIDIA for CUDA), but I’m having problems. I could get the regular version of HPL to install, but the CUDA version gives me the following error:
make[1]: Entering directory /net/user/erasmussen/hpl-2.0_FERMI_v13' ( cd src/auxil/CUDA; make TOPdir=/net/user/erasmussen/hpl-2.0_FERMI_v13 ) make[2]: Entering directory /net/user/erasmussen/hpl-2.0_FERMI_v13/src/auxil/CUDA’
make[2]: *** No rule to make target -I/net/user/erasmussen/hpl-2.0_FERMI_v13/include/hpl_misc.h', needed by HPL_dlacpy.o’. Stop.
make[2]: Leaving directory /net/user/erasmussen/hpl-2.0_FERMI_v13/src/auxil/CUDA' make[1]: *** [build_src] Error 2 make[1]: Leaving directory /net/user/erasmussen/hpl-2.0_FERMI_v13’
make: *** [build] Error 2

Does anyone have much experience with HPL?
Here is my Make.CUDA file:

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

ARCH = CUDA

----------------------------------------------------------------------

- HPL Directory Structure / HPL library ------------------------------

----------------------------------------------------------------------

Set TOPdir to the location of where this is being built

ifndef TOPdir
TOPdir = /net/user/erasmussen/hpl-2.0_FERMI_v13
endif
INCdir = -I$(TOPdir)/include
BINdir = (TOPdir)/bin/(ARCH)
LIBdir = (TOPdir)/lib/(ARCH)

HPLlib = $(LIBdir)/libhpl.a

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining 0MPinc and MPlib.

MPdir = /usr/mpi/gcc/openmpi-1.4.3
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib64/libmpi.so

#MPlib = $(MPdir)/lib64/libmpich.a

----------------------------------------------------------------------

- Linear Algebra library (BLAS) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

#LAdir = (TOPdir)/../../lib/em64t LAdir = /usr/local/cuda LAinc = -I(LAdir)/include

CUDA

#LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread
LAlib = -L $(LAdir)/lib64 -libcublas.so.4.0.17

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/(ARCH) (LAinc) (MPinc) -I/usr/local/cuda/include HPL_LIBS = (HPLlib) (LAlib) (MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

HPL_OPTS = -DCUDA

----------------------------------------------------------------------

HPL_DEFS = (F2CDEFS) (HPL_OPTS) $(HPL_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

next two lines for GNU Compilers:

CC = mpicc
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp

next two lines for Intel Compilers:

CC = mpicc

#CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp

#CCNOOPT = $(HPL_DEFS) -O0 -w

On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

LINKER = (CC) #LINKFLAGS = (CCFLAGS) -static_mpi
LINKFLAGS = $(CCFLAGS)

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

----------------------------------------------------------------------

MAKE = make TOPdir=$(TOPdir)

Excuse me, could you please send me a copy of the High Performance Linpack benchmark (from NVIDIA for CUDA), I don’t know how to get it.
Thank you very much/

Hello erasmus,

First of all ,check if the TopDir is correct, it often is the root of many "No rule to make target " kind of problems.
Second, you need to link some BLAS based library (openblas, atlas, mkl, acml) CUBLAS is not enought :/

hope this works for you

Hi erasmus,

Did by any chance were you able to fix the issue?

Karan