Linking against MKL slows down CUDA

Hi,

I noticed a weird fact. When I link my CUDA application against the Intel MKL, the code runs significantly slower (the factor is around 2). Moreover, I can’t use the NSight debugger. The debug session will terminate immediately after it was started. Did anyone encounter similar issues?

For testing and debugging it is ok for me not to include the MKL stuff, but in the final version I’ll have to link against it. I appreciate any hints or comments.

After updating to the latest version (CUDA 4.2, Parallel NSight 2.2 RC2 with VS2008), I do not experience the speed issue anymore.

Moreover, I did some further investigations on the fact that I can neither use the Parallel NSight debugger nor the Visual Profiler when I use MKL functions in the project. In fact, linking against the MKL and including the header file doesn’t bother any of the 2 tools. However, as soon as one MKL functions is actually called, they stop working. Surprisingly enough, I experience the same problem if I use fftw instead of MKL DFTI.

Any hints?