Run-time CUDA dll linking


I am devoloping a multibody simulation software. I already implemented
various solvers running on the CPU, but recently I also developed a
specific solver which can exploit the NVIDA GPU. Therefore, the user
of my library can choose which kind of solver to use for the simulation,
either based on the GPU, or not.

Currently, if the user has no GPU-capable harware, my library simply
outputs a warning and falls back to a default cpu solver.
However, since I linked my library with the libs of the CUDA SDK
(I am using cuda headers, libs, dlls, etc.), the user must have at
least the cuda.dll and other dlls on its system (even in the abovementioned
case where he has no gpu hardware and he just expect to fall back to a
default cpu solver) otherwise the Windows operating system will
not even start my software and will output the typical error ‘cuda.dll libary not
found in directories … etc. etc.’.

I know, a simple workaround to this issue would be to provide TWO
releases of my libs: one which does not use CUDA headers/libs/dlls/etc.
and the other which does use CUDA, which can be picked by GPU people…
However this is not nice on my opinion because causes proliferation of
So I’d prefer to use the ‘run time dynamic linking’ method (allowed also in
Linux) where I provide a single software, not already linked to cuda libs,
which can use some OS functions to find & load cuda.dll at run-time - if not
found, the solver will fall back to the cpu, and if found, the CUDA functions
will be linked in run-time. For example, this would require using the
LoadLibrary() and GetProcAddress() functions of Windows API.

This will be more intuitive and flexible, from the user point of view,
but not so easy for me - the developer.
So, I wonder if someone else has already tried this with cuda… I think it
is not so straightforward to implement…
Has anyone some hints, suggestions, examples to share?


Alessandro Tasora

Well, unless CUDA does something wrong in DllMain, loading it with LoadLibrary() and then GetProcAddress()-ing everything should work, though it might be a lot of typing (declare a variable to hold address and use GetProcAddress to populate it’s value).

As a quick test I might suggest you to use DLL delay-loading (DLL gets loaded on first reference to an imported DLL member). It has its problems (quirky with native x64 libraries under systems for example), but it’s usually a fastest way to get the functionality you desire.

Just mark the CUDA dlls as delay-loaded when linking (using the /DELAYLOAD: linker option), and then try calling LoadLibrary() in your code – if it loaded okay you can proceed with calling imported functions, if not – well, it’s time for your “fallback” code :)


thank for the good hint!

The only problem, with the DLL delay-loading, is that it is not

something that I can implement in a cross-platform fashion

(for example, the Gnu GCC compiler for Linux seems not to

have this delayed-loading feature…) . In fact, I’d like to

keep my code platform-independent (as much as I can)


Alessandro Tasora