Launches fail in a DLL and only in a DLL

Setup: Studio 2013, Cuda 6.5 and/or Cuda 7.0

I’m finding that I can write CUDA code, and execute it from a main() in the same project. But if I take that same code and make it a dll, when I call it from another project, all CUDA calls work (cudaAllocaMem etc), but launches fail with “Invalid Device Function”.

The strange thing is I’ve done this successfully before and don’t see any difference between the setup of my old and new projects.

I’ve reduced the test case down to the old “Add with cuda” demo. I just change the main to mainlib, and add the dllexport lines. This test case fails the same way so I’m pretty sure I’m not having a memory fault or anything like that.

The test main routine is a standard console application.
The dll is a cuda7 or 6.5 project.

Any hints would be appreciated.

Invalid device function usually means that the kernel code was compiled for a target that does not match the GPU architecture. So I would study any differences between the GPU architecture selections for the different projects.

What GPU are you trying to run on?

What selections have you made for compute_xx and sm_xx switches passed to nvcc?

If you’re not sure, make sure VS has verbose output set (google that if you need to find out how) and look at the actual nvcc compile commands that are being used to generate your dll.

I’m running on a Tesla K40.

The default setting for new projects is compute_20, sm_20. The K40 is compute_35 and sm_35, I believe. I’ve tried to change those settings in Visual Studio (CUDA compiler settings), but they don’t seem to take. During compilation I still see the 20’s in the nvcc commands.

Also, remember the GPU code compiles and runs, if I put main in the program with the GPU code. It’s only when it’s a DLL that it fails.

Is the DLL getting delay-loaded, or loaded at runtime via a plug-in API maybe?

I believe it’s being loaded when the program loads. The DLL is there because I can trace the calls into using the debugger. The Cuda memory allocations in the DLL work fine. It’s just the kernel launches that fail.