I wrote a code which use MKL and CUBLAS functions.
The MKL functions used are the geqrf and the larft functions.
The problem is as follow :
When I compile with icc the execution time of the geqrf function takes 4062 ms, whereas with nvcc, it takes 61959 ms, 20x more …
For the larft function, it takes 3522 ms with icc and 8104 ms with nvcc.
I need to use this function, I know there is a CULA geqrf version but just for single precision.
I would like to test my code in double precision and so, use dgeqrf from Mkl …
Maybe MKL’s function aren’t optimized with nvcc … ?
Has someone have any ideas ?
Here is my Makefile :
LIBS=-lcuda -lcudart -lcula -lcublas -m64
(CC) (CFLAG) -DReal=float qrComplet.cu (LIBS) -I(INCLUDE_CULA) -L$(LIB_CULA) -I$(INCLUDE_MKL) --linker-options /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a,/opt/intel/mkl/lib/intel64/libmkl_sequential.a,/opt/intel/mkl/lib/intel64/libmkl_core.a,-lpthread -o qrComplet