I seem to be missing something when attempting to use NVBLAS with the Intel Fortran compilers.
I appear to be linking and using nvblas.conf correctly as I see feedback from the initialization of NVBLAS at runtime. However, NVBLAS does not seem to be intercepting the calls to DGEMM as only the CPU implementation is executed. This is despite using:
in nvblas.conf (or removing it entirely).
If I disable access to the CPU BLAS implementation by removing:
the program crashes at runtime, as I would expect.
The compiler options I am currently using are shown below, I have also tried manually linking MKL, but with the same results.
# Compiler options FFLAGS=-O3 -axAVX,SSE4.2 -msse3 -align array32byte -fpe1 -fno-alias -openmp -mkl=parallel -heap-arrays 32 # Linker options LDFLAGS= -L/ccc/home/wilkinson/EMPIRE-2064/src/dynamiclibs -lnvblas # List of libraries used LIBS= -L/ccc/home/wilkinson/EMPIRE-2064/src/dynamiclibs -lnvblas
An example of a call to DGEMM is as follows:
Whilst I am currently limited to using the Intel compilers, this restriction will be lifted shortly (at which point I will use CUDA Fortran to optimize data movement).
Thanks in advance,