nvcc + Mkl functions


I wrote a code which use MKL and CUBLAS functions.
The MKL functions used are the geqrf and the larft functions.

The problem is as follow :

When I compile with icc the execution time of the geqrf function takes 4062 ms, whereas with nvcc, it takes 61959 ms, 20x more …
For the larft function, it takes 3522 ms with icc and 8104 ms with nvcc.

I need to use this function, I know there is a CULA geqrf version but just for single precision.

I would like to test my code in double precision and so, use dgeqrf from Mkl …

Maybe MKL’s function aren’t optimized with nvcc … ?

Has someone have any ideas ?

Here is my Makefile :

LIBS=-lcuda -lcudart -lcula -lcublas -m64

build 64:
(CC) (CFLAG) -DReal=float qrComplet.cu (LIBS) -I(INCLUDE_CULA) -L$(LIB_CULA) -I$(INCLUDE_MKL) --linker-options /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a,/opt/intel/mkl/lib/intel64/libmkl_sequential.a,/opt/intel/mkl/lib/intel64/libmkl_core.a,-lpthread -o qrComplet

Thank you.

The C of your cuda file is compiled by gcc by default, so it may not optimize what you want.
If you want it to be compiled with icc you have to pass the “-ccbin=icc” option to nvcc

If you haven’t apply the patch to the intel math.h you will probably encounter compilation error.
And if you use double complex cublas fonctions, you will get errors because of a difference of interpretation on 16B aligned pointers between gcc based code (as cublas is compiled with gcc) and icc based code.

Maybe the magma project (http://icl.cs.utk.edu/magma/software/index.html) will provide the hybrid implementation of the lapack fonctions you need…

Good luck!

Thank you for your answer.

With the -ccbin=icc option, I’ve this error :

/usr/local/cuda/bin/…/include/host_config.h(108): catastrophic error: #error directive: – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
#error – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

make: *** [build] Error 4

Is it the error you told me ?

I’ve include mkl.h in my code.


This error seems quite explicit, your version of icc is too old.

An other workaround i didn’t mention is to cpmile everything with icc.
nvcc compiler is mandatory only for kernel definition and call.
if you have only Cuda API and cublas fonctions, you can compile wwithout nvcc.
You will have to include “cuda_runtime.h” and “cublas.h” in your C file, and specifie the include dir, lib dir and link with -lcublas -lcudart -lcuda

If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.

If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).

I have the 12.0 version of icc.

I’ve the same error when I compile with icc.

This is my makefile :

LIBS=-lcuda -lcudart -lcublas -m64

build 64:
(CC) (CFLAG) -DReal=float qrCompletGPU.c -I$(INCLUDE) -L$(LIB_CUDA) $(LIBS) -lpthread -o qrComplet

I compile with nvcc because I use Cuda kernel in my code …

So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.

Even if I do that, I’ve a the same error :

/usr/local/cuda/include/host_config.h(108): catastrophic error: #error directive: – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

#error – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

As has been said twice already don’t use icc with nvcc. You have an unsupported version of icc. But that doesn’t matter. Just compile the device code with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.

I didn’t use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.

The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.

Ok thanks I’ve understood. The code works now, it’s because I’ve included “cuda.h”…

But, I have to put some cuda kernel in my code, I haven’t understood how to compile the “device code” and the “host code”(MKL+ CUBLAS), separately.

Could you explain it again ?


The kernel I use is the transposition kernel.
I call this kernel inside loops, so I don’t understand how could I compile separatly …

Make a “wrapper” host function which contains the kernel code in a .cu file, something like this:

__global__ kernel(arg1,arg2)




extern "C" int callkernel(arg1, arg2, .....)





kernel<<< ... >>>(arg1, arg2);



In your icc compiled code, then use callkernel to launch the kernel. Then link the resulting object from nvcc with the icc code. That is all there is to it.

Ok thank you very much, I’m going to try it .

The code compile and it works.
Thank you very much.