When i am starting installing linpack i have such params:
----------------------------------------------------------------------
- shell --------------------------------------------------------------
----------------------------------------------------------------------
SHELL = /bin/sh
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
----------------------------------------------------------------------
- Platform identifier ------------------------------------------------
----------------------------------------------------------------------
ARCH = Linux_ATHLON_FBLAS
----------------------------------------------------------------------
- HPL Directory Structure / HPL library ------------------------------
----------------------------------------------------------------------
TOPdir = /opt/hpl/hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a
----------------------------------------------------------------------
- Message Passing library (MPI) --------------------------------------
----------------------------------------------------------------------
MPdir = /home/hpcuser/openmpi-install
MPinc = -I$(MPdir)/include
MPlib =
----------------------------------------------------------------------
- Linear Algebra library (BLAS or VSIPL) -----------------------------
----------------------------------------------------------------------
LAdir = /usr/local/cuda
LAinc = -I$(LAdir)/include -I$(LAdir)/src/
LAlib = -L$(LAdir)/lib64/ -lcublas -lcudart
----------------------------------------------------------------------
- F77 / C interface --------------------------------------------------
----------------------------------------------------------------------
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#F2CDEFS = -DNoChange -DF77_INTEGER=int -DStringSunStyle
----------------------------------------------------------------------
- HPL includes / libraries / specifics -------------------------------
----------------------------------------------------------------------
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
- Compile time options -----------------------------------------------
HPL_OPTS =
----------------------------------------------------------------------
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
----------------------------------------------------------------------
- Compilers / linkers - Optimization flags ---------------------------
----------------------------------------------------------------------
CC = mpicc
#CC = /usr/local/cuda/bin/nvcc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#CCFLAGS = $(HPL_DEFS) -fno-f2c -fno-second-underscore -O3
LINKER = mpif90
LINKFLAGS = $(CCFLAGS)
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
----------------------------------------------------------------------
but when i am starting the installation i have such errors:
HPL_dscal.c:(.text+0x28): undefined reference to `dscal_’
/opt/hpl/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a(HPL_idamax.o): In function `HPL_idamax’:
HPL_idamax.c:(.text+0x1d): undefined reference to `idamax_’
/opt/hpl/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a(HPL_dtrsv.o): In function `HPL_dtrsv’:
HPL_dtrsv.c:(.text+0xb0): undefined reference to `dtrsv_’
/opt/hpl/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a(HPL_dger.o): In function `HPL_dger’:
HPL_dger.c:(.text+0x67): undefined reference to `dger_’
HPL_dger.c:(.text+0xa3): undefined reference to `dger_’
/opt/hpl/hpl/lib/Linux_ATHLON_FBLAS/libhpl.a(HPL_dtrsm.o): In function `HPL_dtrsm’:
HPL_dtrsm.c:(.text+0x117): undefined reference to `dtrsm_’
HPL_dtrsm.c:(.text+0x1be): undefined reference to `dtrsm_’
What am i doing wrong?
Trying to use CUBLAS like a drop in replacement for the standard fortran BLAS. It isn’t. You won’t be able to just build LINPACK using CUBLAS. There is a manual for CUBLAS supplied with the tool kit. You probably ought to read it. You will see several things - amongst them
CUBLAS function names don’t follow the same naming conventions as BLAS
CUBLAS functions require additional support function calls to manage memory on the GPU and copy data to and from the GPU (which any code expecting a standard BLAS will not contain)
3,. CUBLAS is a relatively limited subset of a complete BLAS and many functions are not implementedm although this is rumoured to be considerably improved in the current CUDA 3.0 beta.
Trying to use CUBLAS like a drop in replacement for the standard fortran BLAS. It isn’t. You won’t be able to just build LINPACK using CUBLAS. There is a manual for CUBLAS supplied with the tool kit. You probably ought to read it. You will see several things - amongst them
CUBLAS function names don’t follow the same naming conventions as BLAS
CUBLAS functions require additional support function calls to manage memory on the GPU and copy data to and from the GPU (which any code expecting a standard BLAS will not contain)
3,. CUBLAS is a relatively limited subset of a complete BLAS and many functions are not implementedm although this is rumoured to be considerably improved in the current CUDA 3.0 beta.
Thanks for replying.
I`ve read the manual for CUBLAS. The functions are really different. So, I mast write my own wrapper for CUBLAS functions for linpack, am I right?
Because you have read the manual for CUBLAS, you will be aware that there is already a form of wrapper interface available (referred to as the thunking interface). You will also be aware that NVIDIA don’t recommend using it, or that sort of direct wrapping approach, because it is very slow and severely hamstrung by the bandwdith and latency of the PCI express bus.
Because you have read the manual for CUBLAS, you will be aware that there is already a form of wrapper interface available (referred to as the thunking interface). You will also be aware that NVIDIA don’t recommend using it, or that sort of direct wrapping approach, because it is very slow and severely hamstrung by the bandwdith and latency of the PCI express bus.
I`ve reread the manual for CUBLAS and made some changes in my makefile:
----------------------------------------------------------------------
- shell --------------------------------------------------------------
----------------------------------------------------------------------
SHELL = /bin/sh
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
----------------------------------------------------------------------
- Platform identifier ------------------------------------------------
----------------------------------------------------------------------
ARCH = Linux_ATHLON_FBLAS
----------------------------------------------------------------------
- HPL Directory Structure / HPL library ------------------------------
----------------------------------------------------------------------
TOPdir = /opt/hpl/hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a
----------------------------------------------------------------------
- Message Passing library (MPI) --------------------------------------
----------------------------------------------------------------------
MPdir = /home/hpcuser/openmpi-install
MPinc = -I$(MPdir)/include
MPlib =
----------------------------------------------------------------------
- Linear Algebra library (BLAS or VSIPL) -----------------------------
----------------------------------------------------------------------
LAdir = /usr/local/cuda
LAinc = -I$(LAdir)/include
LAlib = -L$(LAdir)/lib64/ -lcublas -lcudart
----------------------------------------------------------------------
- F77 / C interface --------------------------------------------------
----------------------------------------------------------------------
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
----------------------------------------------------------------------
- HPL includes / libraries / specifics -------------------------------
----------------------------------------------------------------------
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
- Compile time options -----------------------------------------------
HPL_OPTS =
----------------------------------------------------------------------
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
----------------------------------------------------------------------
- Compilers / linkers - Optimization flags ---------------------------
----------------------------------------------------------------------
CC = g77 -E -x f77-cpp-input
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fno-f2c -fno-second-underscore -O3
LINKER = mpif90
LINKFLAGS = $(CCFLAGS)
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
----------------------------------------------------------------------
Also I`ve defined CUBLAS_USE_THUNKING.
But the compiler says me:
g77 -E -x f77-cpp-input -o HPL_dlacpy.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/opt/hpl/hpl/include -I/opt/hpl/hpl/include/Linux_ATHLON_FBLAS -I/usr/local/cuda/include -I/home/hpcuser/openmpi-install/include -fno-f2c -fno-second-underscore -O3 …/HPL_dlacpy.c
In file included from /usr/include/features.h:329,
from /usr/include/stdio.h:28,
from /opt/hpl/hpl/include/hpl_misc.h:57,
from /opt/hpl/hpl/include/hpl.h:75,
from ../HPL_dlacpy.c:50:
/usr/include/sys/cdefs.h:32: #error “You need a ISO C conforming compiler to use the glibc headers”
In file included from /opt/hpl/hpl/include/hpl_misc.h:65,
from /opt/hpl/hpl/include/hpl.h:75,
from ../HPL_dlacpy.c:50:
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include/varargs.h:4: #error “GCC no longer implements <varargs.h>.”
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/include/varargs.h:5: #error “Revise your code to use <stdarg.h>.”
make[2]: *** [HPL_dlacpy.o] Error 1
make[2]: Leaving directory `/opt/hpl/hpl/src/auxil/Linux_ATHLON_FBLAS’
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/opt/hpl/hpl’
make: *** [build] Error 2
The compiler version is 4.1.2
As the error message says, you are trying to compile C code with a fortran compiler…
As an aside, how many gpus do you have? You do realize that unless you have at least 4 you are wasting your time, because HPL requires a minimum of 4 MPI processes, and each requires its own GPU?
I tried to use gcc and i says that message, too.
As an aside, how many gpus do you have? You do realize that unless you have at least 4 you are wasting your time, because HPL requires a minimum of 4 MPI processes, and each requires its own GPU?
I have 4 GPUs. So the idea of installing linpack is very realistic for me :)
I wonder, is there any manual for installing linpack with cublas. I have read alot of papers that say about performance of nvidia video accelerators, but no one did not write how can we install linpack step-by-step. As it is done on amd site with firestream and amlg.
avidday
December 3, 2009, 10:15am
8
No there isn’t. HPL requires fairly significant modifications to be used with CUBLAS. You can’t just “compile and run”, if that is what you were hoping for.
What fairly significant modifications mast I do to make linpack work with CUBLAS?
avidday
December 4, 2009, 3:09pm
10
Right now that is an excercise left to the reader. One approach is to build HPL with modified versions of the supplied C blas functions (HPL_sgemm etc) which call CUBLAS functions and/or host BLAS functions depending size metrics.
Thats a good idea to modify the versions of cblas functions. There is the file in HPL called hpl_blas.h, where the prototypes of blas functions are defined. I
ve tried to substitute the cublas functions instead of cblas functions. Then HPL was compiled without warnings and errors, but the runtime errors appeared and application crashed.
Today i`ll compare the signature of functions from cblas and cublas ones, then modify them in hpl. I hope that it will work.
P.S. I wonder, why there are no related topics on the forum.
avidday
December 7, 2009, 3:12pm
12
Thats a good idea to modify the versions of cblas functions. There is the file in HPL called hpl_blas.h, where the prototypes of blas functions are defined. I
ve tried to substitute the cublas functions instead of cblas functions. Then HPL was compiled without warnings and errors, but the runtime errors appeared and application crashed.
How are you managing GPU memory transfers? You really only need to worry about DGEMM to start with, that is where the largest performance improvements can be had.
Today i`ll compare the signature of functions from cblas and cublas ones, then modify them in hpl. I hope that it will work.
P.S. I wonder, why there are no related topics on the forum.
There are a couple, but it seems that for most people LINPACK isn’t all that interesting (plus requiring 4 CUDA devices is a large barrier to entry).
mfatica
December 7, 2009, 3:34pm
13
You can run Linpack on a single GPU, no need for 4 devices.
You need to offload DGEMM calls that are big enough to keep the GPU busy and to amortize the data transfer.
Replacing all the DGEMM calls with CUBLAS calls, it is not a good idea.
avidday
December 7, 2009, 3:42pm
14
Forgiven my skepticism, but how can you run HPL with only one GPU?
mfatica
December 7, 2009, 3:47pm
15
You can run HPL with a single MPI process, just set P=Q=1 in the HPL.dat file.
That`s right, i used this to test the AMD GPUS performance.
As for Linpack and CUDA. Is there any installation guide were it is written what I must correct in linpack to use cublas?
avidday
December 7, 2009, 4:05pm
17
So you can! Maybe I got that notion into my head with a vendor supplied version that wouldn’t run with less than 4 MPI process or something… I must admit I only ever ran it on many more nodes than that.