Installation of Linpack for Fermi

I plan to install linpack for Fermi on Linux CentOS.

I followed the install guidance:
1)install MPI 1.4.2
2)install Intel MKL 10.X
3)CUDA 3.0
4)Tesla Fermi cards

Then I used the Make.CUDA_pinned file and editd:

This is just a sample Make.

The user may need to edit:

1.) TOPdir

2.) MPI variables (MPdir,MPinc,MPlib)

3.) MKL BLAS variables (LAdir, LAinc, LAlib)

4.) The Compiler and Compiler/Linker Options (CC,CCFLAGS)

– High Performance Computing Linpack Benchmark (HPL)

HPL - 1.0a - January 20, 2004

Antoine P. Petitet

University of Tennessee, Knoxville

Innovative Computing Laboratories

© Copyright 2000-2004 All Rights Reserved

– Copyright notice and Licensing terms:

Redistribution and use in source and binary forms, with or without

modification, are permitted provided that the following conditions

are met:

1. Redistributions of source code must retain the above copyright

notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright

notice, this list of conditions, and the following disclaimer in the

documentation and/or other materials provided with the distribution.

3. All advertising materials mentioning features or use of this

software must display the following acknowledgement:

This product includes software developed at the University of

Tennessee, Knoxville, Innovative Computing Laboratories.

4. The name of the University, the name of the Laboratory, or the

names of its contributors may not be used to endorse or promote

products derived from this software without specific written

permission.

– Disclaimer:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS

``AS IS’’ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT

LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY

OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,

SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT

LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,

DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY

THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE

OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

##########

----------------------------------------------------------------------

- shell --------------------------------------------------------------

----------------------------------------------------------------------

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

ARCH = CUDA_pinned

----------------------------------------------------------------------

- HPL Directory Structure / HPL library ------------------------------

----------------------------------------------------------------------

Set TOPdir to the location of where this is being built

ifndef TOPdir
#TOPdir = pwd
TOPdir = /home/hpl-2.0_FERMI_v04
endif
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)

HPLlib = $(LIBdir)/libhpl.a

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir = /usr/local/openmpi
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libvt.mpi.a
#MPlib = $(MPdir)/lib64/libmpich.a

----------------------------------------------------------------------

- Linear Algebra library (BLAS) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

#LAdir = $(TOPdir)/…/…/lib/em64t
LAdir = /opt/intel/mkl/10.2.5.035/lib/em64t
#LAdir = /share/apps/intel/mkl/10.1.0.99/lib/em64t
#LAdir = /share/apps/intel/mkl/10.0.4.023/lib/em64t
#LAdir = /share/apps/intel/mkl/10.2.4.032/libem64t
LAinc = -I /opt/intel/mkl/10.2.5.035/include

CUDA

#LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread
LAlib = -L /opt/intel/mkl/10.2.5.035/lib/em64t -lmkl
#LAlib = -L$(LAdir) -lmkl -liomp5
#LAlib = -L$(LAdir) -lmkl $(LAdir)/libguide.a -lpthread

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

You can skip this section if and only if you are not planning to use

a BLAS library featuring a Fortran 77 interface. Otherwise, it is

necessary to fill out the F2CDEFS variable with the appropriate

options. One and only one option should be chosen in each of

the 3 following categories:

1) name space (How C calls a Fortran 77 routine)

-DAdd_ : all lower case and a suffixed underscore (Suns,

Intel, …), [default]

-DNoChange : all lower case (IBM RS6000),

-DUpCase : all upper case (Cray),

-DAdd__ : the FORTRAN compiler in use is f2c.

2) C and Fortran 77 integer mapping

-DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]

-DF77_INTEGER=long : Fortran 77 INTEGER is a C long,

-DF77_INTEGER=short : Fortran 77 INTEGER is a C short.

3) Fortran 77 string handling

-DStringSunStyle : The string address is passed at the string loca-

tion on the stack, and the string length is then

passed as an F77_INTEGER after all explicit

stack arguments, [default]

-DStringStructPtr : The address of a structure is passed by a

Fortran 77 string, and the structure is of the

form: struct {char *cp; F77_INTEGER len;},

-DStringStructVal : A structure is passed by value for each Fortran

77 string, and the structure is of the form:

struct {char *cp; F77_INTEGER len;},

-DStringCrayStyle : Special option for Cray machines, which uses

Cray fcd (fortran character descriptor) for

interoperation.

F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -I/usr/local/cuda/include
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

HPL_OPTS = -DCUDA_PINNED

----------------------------------------------------------------------

HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

next two lines for GNU Compilers:

CC = mpicc
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall

next two lines for Intel Compilers:

CC = mpicc

CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops

CCNOOPT = $(HPL_DEFS) -O0 -w

On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

LINKER = $(CC)
#LINKFLAGS = $(CCFLAGS) -static_mpi
LINKFLAGS = $(CCFLAGS)

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

----------------------------------------------------------------------

MAKE = make TOPdir=$(TOPdir)


then Compile with the command: " make arch=CUDA_pinned"

it seems something wrong:

HPL_pdtest.o: In function HPL_pdtest': HPL_pdtest.c:(.text+0x117): undefined reference to assignDeviceToProcess’
HPL_pdtest.c:(.text+0x148): undefined reference to cudaMallocHost' HPL_pdtest.c:(.text+0x770): undefined reference to cudaFreeHost’
/home/hpl-2.0_FERMI_v04/lib/CUDA_pinned/libhpl.a(HPL_pdpanel_init.o): In function HPL_pdpanel_init': HPL_pdpanel_init.c:(.text+0x2b9): undefined reference to cudaMallocHost’
HPL_pdpanel_init.c:(.text+0x38b): undefined reference to cudaMallocHost' HPL_pdpanel_init.c:(.text+0x480): undefined reference to cudaMallocHost’
/home/hpl-2.0_FERMI_v04/lib/CUDA_pinned/libhpl.a(HPL_pdpanel_free.o): In function HPL_pdpanel_free': HPL_pdpanel_free.c:(.text+0x28): undefined reference to cudaFreeHost’
HPL_pdpanel_free.c:(.text+0x36): undefined reference to cudaFreeHost' /usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_flush’:
vt_otf_gen.c:(.text+0x49c): undefined reference to OTF_WStream_writeDefProcess' vt_otf_gen.c:(.text+0x4f3): undefined reference to OTF_WStream_writeDefProcessGroup’
vt_otf_gen.c:(.text+0x51e): undefined reference to OTF_WStream_writeDefinitionComment' vt_otf_gen.c:(.text+0x556): undefined reference to OTF_WStream_writeEventComment’
vt_otf_gen.c:(.text+0x591): undefined reference to OTF_WStream_writeCounter' vt_otf_gen.c:(.text+0x5e1): undefined reference to OTF_WStream_writeFileOperation’
vt_otf_gen.c:(.text+0x624): undefined reference to OTF_WStream_writeCounter' vt_otf_gen.c:(.text+0x659): undefined reference to OTF_WStream_writeLeave’
vt_otf_gen.c:(.text+0x68b): undefined reference to OTF_WStream_writeEnter' vt_otf_gen.c:(.text+0x6d1): undefined reference to OTF_WStream_writeCounter’
vt_otf_gen.c:(.text+0x6f6): undefined reference to OTF_WStream_writeDefProcessGroup' vt_otf_gen.c:(.text+0x729): undefined reference to OTF_WStream_writeDefCounter’
vt_otf_gen.c:(.text+0x751): undefined reference to OTF_WStream_writeDefCounterGroup' vt_otf_gen.c:(.text+0x798): undefined reference to OTF_WStream_writeFunctionSummary’
vt_otf_gen.c:(.text+0x7ee): undefined reference to OTF_WStream_writeCollectiveOperation' vt_otf_gen.c:(.text+0x824): undefined reference to OTF_WStream_writeRecvMsg’
vt_otf_gen.c:(.text+0x85a): undefined reference to OTF_WStream_writeSendMsg' vt_otf_gen.c:(.text+0x8bc): undefined reference to OTF_WStream_writeFileOperationSummary’
vt_otf_gen.c:(.text+0x915): undefined reference to OTF_WStream_writeMessageSummary' vt_otf_gen.c:(.text+0x92e): undefined reference to OTF_WStream_writeDefCollectiveOperation’
vt_otf_gen.c:(.text+0x954): undefined reference to OTF_WStream_writeDefFunction' vt_otf_gen.c:(.text+0x973): undefined reference to OTF_WStream_writeDefFunctionGroup’
vt_otf_gen.c:(.text+0x995): undefined reference to OTF_WStream_writeDefFile' vt_otf_gen.c:(.text+0x9b4): undefined reference to OTF_WStream_writeDefFileGroup’
vt_otf_gen.c:(.text+0x9d5): undefined reference to OTF_WStream_writeDefScl' vt_otf_gen.c:(.text+0x9eb): undefined reference to OTF_WStream_writeDefSclFile’
vt_otf_gen.c:(.text+0xa3a): undefined reference to OTF_WStream_writeOtfVersion' vt_otf_gen.c:(.text+0xa47): undefined reference to OTF_WStream_writeDefCreator’
vt_otf_gen.c:(.text+0xa54): undefined reference to OTF_WStream_writeDefTimerResolution' /usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_close’:
vt_otf_gen.c:(.text+0x2e00): undefined reference to OTF_WStream_close' /usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_open’:
vt_otf_gen.c:(.text+0x2ec8): undefined reference to OTF_FileManager_open' vt_otf_gen.c:(.text+0x2ed9): undefined reference to OTF_WStream_open’
vt_otf_gen.c:(.text+0x303d): undefined reference to OTF_WStream_setCompression' /usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_delete’:
vt_otf_gen.c:(.text+0x30e9): undefined reference to OTF_getFilename' vt_otf_gen.c:(.text+0x310d): undefined reference to OTF_getFilename’
vt_otf_gen.c:(.text+0x3132): undefined reference to OTF_getFilename' vt_otf_gen.c:(.text+0x31c1): undefined reference to OTF_FileManager_close’
/usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_get_statname': vt_otf_gen.c:(.text+0x269): undefined reference to OTF_getFilename’
/usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_get_eventname': vt_otf_gen.c:(.text+0x289): undefined reference to OTF_getFilename’
/usr/local/openmpi/lib/libvt.mpi.a(libvt_mpi_a-vt_otf_gen.o): In function VTGen_get_defname': vt_otf_gen.c:(.text+0x2a9): undefined reference to OTF_getFilename’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_ok_to_fork' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_end_single’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_ordered' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_8’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_get_thread_num' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_barrier’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_get_num_threads' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_get_num_procs’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_4' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_end_reduce_nowait’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_critical' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_fini_8’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_serialized_parallel' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_end_critical’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_8' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to ompc_set_nested’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_get_nested' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_fini_4’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_in_parallel' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_push_num_threads’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_reduce_nowait' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to omp_get_max_threads’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_4' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_end_serialized_parallel’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_flush' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_single’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_8' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_4’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_global_thread_num' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_end_ordered’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_fork_call' /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_fixed8_add’
/opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_fini' collect2: ld returned 1 exit status make[2]: *** [dexe.grd] Error 1 make[2]: Leaving directory /home/hpl-2.0_FERMI_v04/testing/ptest/CUDA_pinned’
make[1]: *** [build_tst] Error 2
make[1]: Leaving directory `/home/hpl-2.0_FERMI_v04’
make: *** [build] Error 2

could you tell me what’s wrong with it?

Hi, I documented the procedure I did to get NVIDIA’s HPL working on both S1070 and S2050, I am getting similar efficiencies to what NVIDIA has published in their GTC2010 conference for 1 node runs (73% for C1060 and 63% for M2050). Here’s the link for the HOWTO, I hope you find it useful:

HOWTO - HPL on GPU

Thank you

Mohamad Sindi

Saudi Aramco

EXPEC Computer Center

High Performance Computing Group

@Mohamad Sindi

Hi,

I’ve been trying to benchmark a system with NVIDIA Tesla K40c (driver version: 390.30 + cuda version 9.1).

Both nvidia driver and cuda are working fine.

/usr/local/cuda-9.1/bin/nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

nvidia-smi

Thu Mar 8 12:36:43 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 00000000:C4:00.0 Off | 0 |
| 23% 43C P0 66W / 235W | 0MiB / 11441MiB | 64% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I’ve installed:

  1. OpenBLAS
  2. mpich
  3. hpl-2.0_FERMI_v15

i’ve already got compiled binaries for mpich and openBLAS and also got binaries and libraries for CUDA as well.

On trying to compile HPL with the following Make.CUDA file, i’m getting error of unavailability of mpi.h file:

make[1]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest' make -f Make.top build_src arch=CUDA make[1]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest’
( cd src/auxil/CUDA; make TOPdir=/root/hpl/hpl-2.0_FERMI_v15_latest )
make[2]: Entering directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA' /root/hpl/mpich/bin/mpicc -o HPL_dlacpy.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -DCUDA -I/root/hpl/hpl-2.0_FERMI_v15_latest/include -I/root/hpl/hpl-2.0_FERMI_v15_latest/include/CUDA -I-I/root/hpl/mpich/include64 -I/usr/local/cuda-9.1/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp ../HPL_dlacpy.c In file included from /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl.h:80:0, from ../HPL_dlacpy.c:50: /root/hpl/hpl-2.0_FERMI_v15_latest/include/hpl_pmisc.h:54:17: fatal error: mpi.h: No such file or directory #include "mpi.h" ^ compilation terminated. make[2]: *** [HPL_dlacpy.o] Error 1 make[2]: Leaving directory /root/hpl/hpl-2.0_FERMI_v15_latest/src/auxil/CUDA’
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/root/hpl/hpl-2.0_FERMI_v15_latest’
make: *** [build] Error 2

Make.CUDA file

- shell --------------------------------------------------------------

----------------------------------------------------------------------

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- Platform identifier ------------------------------------------------

----------------------------------------------------------------------

ARCH = CUDA

Set TOPdir to the location of where this is being built

ifndef TOPdir
TOPdir =/root/hpl/hpl-2.0_FERMI_v15_latest
endif
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)

HPLlib = $(LIBdir)/libhpl.a

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir = /root/hpl/mpich
MPinc = -I$(MPdir)/include64
#MPlib = $(MPdir)/lib64/libmpi.a
MPlib = $(MPdir)/lib/libmpich.so

----------------------------------------------------------------------

- Linear Algebra library (BLAS) -----------------------------

----------------------------------------------------------------------

LAinc tells the C compiler where to find the Linear Algebra library

header files, LAlib is defined to be the name of the library to be

used. The variable LAdir is only used for defining LAinc and LAlib.

#LAdir = $(TOPdir)/…/…/lib/em64t
LAdir = /root/hpl/openblas
LAinc =

CUDA

#LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread
LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda-9.1/lib64 -lcuda -lcudart -lcublas -L$(LAdir)/libopenblas.so

----------------------------------------------------------------------

- F77 / C interface --------------------------------------------------

----------------------------------------------------------------------

F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle

----------------------------------------------------------------------

- HPL includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -I/usr/local/cuda-9.1/include
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)

- Compile time options -----------------------------------------------

-DHPL_COPY_L force the copy of the panel L before bcast;

-DHPL_CALL_CBLAS call the cblas interface;

-DHPL_DETAILED_TIMING enable detailed timers;

-DASYOUGO enable timing information as you go (nonintrusive)

-DASYOUGO2 slightly intrusive timing information

-DASYOUGO2_DISPLAY display detailed DGEMM information

-DENDEARLY end the problem early

-DFASTSWAP insert to use DLASWP instead of HPL code

By default HPL will:

*) not copy L before broadcast,

*) call the BLAS Fortran 77 interface,

*) not display detailed timing information.

HPL_OPTS = -DCUDA

----------------------------------------------------------------------

HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

next two lines for GNU Compilers:

CC = /root/hpl/mpich/bin/mpicc
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp

next two lines for Intel Compilers:

CC = mpicc

CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp

CCNOOPT = $(HPL_DEFS) -O0 -w

On some platforms, it is necessary to use the Fortran linker to find

the Fortran internals used in the BLAS library.

#LINKER = mpif77
LINKER = /root/hpl/mpich/bin/mpif77
#LINKFLAGS = $(CCFLAGS) -static_mpi
#LINKFLAGS = $(CCFLAGS)

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

----------------------------------------------------------------------

MAKE = make TOPdir=$(TOPdir)

Some suggesstions which I got were:

  1. install mpich-devel package (done and can see /usr/include/mpi.h file on my local file system but still the error persists, also tried copying the entire include folder into cpmpile mpich folder under hpl folder but still the issue remains)

Can someone help me with this? I’ve been trying to debug this for long.

Thanks,
Karan