Dynamically link the cuda.cll

Can somebody tell me if it is possible with Cuda 2.0 (xp) to load the dll dynamically and use the functions. Our program als need to run when cuda is not installed.


Yes, that’s possible.

If you’re using Driver API then you have to load nvcuda.dll dynamically. In fact, it is much more convinient not to load it dynamically (via LoadLibrary()) but to use what is called delayed import.

To do so you have to modify your project settings “Linker->Input” (or add /DELAYLOAD:nvcuda.dll to your linker command line).

Next, #include <delayimp.h>.

Now, before using any of CUDA functions run this code:

int cudaDetect()


	// Try to load imports from NVIDIA driver DLL

	// If this is not possible then there's no compatible driver installed



  if( FAILED( __HrLoadAllImportsForDll( "nvcuda.dll" ) ) )

  	return 0;




  return 0;


	// Initialize driver

	if( CUDA_SUCCESS != cuInit( 0 ) )


  _ftprintf_s( stderr, _T(" ! cuInit() failed\n") );

  return 0;



	return 1;


Now you can detect presence of CUDA devices at runtime without requiring user to have nvcuda.dll (i.e. compatible driver). Similar technique probably may be used if you’re working with Runtime API (you need to delayload cudrt.dll then) but I haven’t tried it. This works fine with CUDA 1.1 and should work with 2.0.

Hope this helps.

Thanks for the reply. I am trying to implement it, but I get a warning:

LINK : warning LNK4199: /DELAYLOAD:nvcuda.dll ignored; no imports found from nvcuda.dll

Doe I still need to link with the Libs?

Thanks again in advance

You’re probably using Runtime API, that’s why you don’t have imports from nvcuda.dll. Try replacing nvcuda.dll with cudart.dll in command line and in code.

Has anyone managed to solve this issue for Cuda Runtime API?

My application uses a dll that I’ve builded. This dll performs some CUDA operations using the runtime API and is set to delay import cudart.dll just as AndreiB has said.

In order to test the application for when the cudart.dll file is missing, I have renamed it. Now, when running the application, even though no CUDA function call has been made yet, I get a module not found first chance exception in delayLoaderHelper2. This happens while the main application is calling DllMainCrtStartup function for my dll, before any other function call or instruction in the main function.

It seems like before begining to run the application, all my dll dependencies are also somehow checked, including cudart.dll. Looking in delayLoaderHelper2 function I could see something related to CudaRegisterFatBinary, but I don’t know what that is…:(

Any idea or help would be greatly appreciated!!!
Thank you.

I ran into a similiar problem with using the Cuda Runtime API. I don’t believe you can delay load it in the same manner as you generally compile part of the program using Nvidia’s compiler. I was successfully able to delay load the Cuda Runtime API but I did it in a non-ideal way.

I bascially seperated my application so that all code related to CUDA was packaged in its own dll. You can then delay load this or use loadlibrary() call. If the library fails to load then you can assume that some part of it failed likely missing the cuda runtime code and then use your alternative code in place of the cuda code.

I am also getting the same exception when I Delay load the cudart.dll.

I wrote a simple cuda program which calls a Cuda Runtime function. This application is set to delay load cudart.dll. (I removed cuda/bin dir from the env path before executing the program). I think the delayload helper function that comes with vc++ can be customized to handle the exception and act accordingly. We can even write our own helper function.

Please see this link: http://msdn.microsoft.com/en-us/library/151kt790.aspx

Even simpler would be to use LoadLibrary. This post might help: