How to write an application that optionally uses CUDA? without actually requiring *any* CUDA related


so we have some engineering code that would benefit from CUDA acceleration. However the same code should also execute on machines that are not equipped with CUDA hardware.

We currently use the runtime API for development. How would you create a binary where CUDA acceleration is optional, and no CUDA specific file (driver, runtime DLL etc) is strictly required at run time?

I’m currently using C# to provide a front-end for some CUDA code, using CUDA.NET from GASS to provide dynamic wrapping of the binary (.cubin file). I’m aware that the CUDA.NET methods will throw exceptions left, right and center, if you try calling CUDA code on a platform which is not CUDA enabled. Some simple exception handling can hide this from the user and also redirect how your application progresses i.e. execute code natively on the host.

I’m sure you can do something similar in any language which supports exception handling.

Hope this helps!

I think I have a solution.

Linking CUDA libraries statically is out of question. Check out this URL:

Even the driver API requires nvcuda.dll to be installed on the running machine.

WHat I suggest is :

when you link CUDA libraries to your application, mark them “DELAY LOADED”. Thus windows will NOT stop application loading because CUDA dlls are not present (thats my vague memory). Now, when control reaches main() – Call some routines that would first check if CUDA is installed on the machine.

Like, for example: Search $PATH, $LD_LIBRARY_PATH variables to locate "cuda.dll"or “nvcuda.dll” or both. You could consider spawning some already available executable (like “which cuda.dll” or sthg similar to that) to do this job. If not, you can write it all yourself.

Once you find that CUDA is NOT isntalled, your application can call NORMAL routines and NEVER CALL CUDA functions.

That should solve your problem.

NOTE: Delay loading has some constraints in Windows. Not sure if CUDART would be fine with it. Check this out: ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.VisualStudio.v80.en/dv_vccomp/html/0097ff65-550f-4a4e-8ac3-39bf6404f926.htm

Appreciate if NVIDIA people can comment on this aspect…

Some points on multi-GPU support:

We have written a multi-threaded foundation library for our higher level finance-library that would deal with multi-GPU support. It is a simple library. It would also help you write Multi-GPU code. Available for commercial license. Works for windows. Should work for Linux. Not tested yet.

For example we have APIs like “PFC_AcquireCUDADevices(int number_of_devices, ACQUIRE_MODE (shared/exclusive));” – Thus if your application is multi-threaded , threads could share the available GPUs among themselves.

We also provide some utility APIs that would greatly simplify the task of multi-GPU programming - for example, data-splitting API.

At the bottom, it is very much similar to Mr.Anderson’s multi-GPU implementation (thanks to him) BUT we don’t require boost. We do it all ourselves. And, we provide classes that would also make your cudaAPI calls more natural (see readability).

Also note that whatever delay-load stuff that I was mentioning could be implemented in our PFC library itself to make the job simple for you. We will do the DELAY LOADING and your application just need to link with us.

PM me if you would be interested!

Good Luck,

Best Regards,


I’ve explained how to do it with Driver API:…st&p=404983

To my knowledge this is not possible with Runtime API.

I could put all my CUDA code into an external DLL that statically links against cudart.dll (runtime API)

I dynamically open that DLL using LoadLibrary() and this fails we don’t have cudart.dll. The question is if this will fail with an annoying dialog box that the user has to acknowledge by pressing “OK”.

I then query the export symbol for my CUDA accelerated code from the DLL and call it if available.

Otherwise I fall back to the unaccelerated code.

This could work similarly (albeit with the dlopen API) on Linux.

Oh! Good to know this was discussed before. I’ll check with “cudart” during my free time tomorrow and post an update here. Lets see if sthg has changed with CUDA 2.0

LoadLibrary is a cool idea. Did not strike me at all. :(

But I guess calling functions inside the DLL would be a pain (readability…)

I am in the process of adding cuda acceleration for different parts of our software. We actually added a simple yet efficient plugin architecture for this (and other stuff we want to add). we have the default plugin which is actually in our core code. then we have a simple system that looks for dlls in the pluing dir, if it finds one the it trys to dynamically load it and register it. if its successful then the calls to the original plugin now go to the new dynamically loaded one. this way our core code isn’t cuda dependent and doesn’t even know cuda exists. and if the cuda dll fails to load the it automatically falls back on the default implementation. I have the cuda plugins written actually both in run time and driver api and they both work.

This should work, I think. Windows will not present any message boxes, you just have to handle LoadLibrary() failure.

erdooom, yes, with plugins it’s pretty simple =) we’ve done almost the same thing when CUDA-enabling our software.

Yes, As long as your application has a plug-in architecture, I think it would be cool!

Like yummig, I’ve been writing my apps in C# and linking into the cuda driver DLL. He mentioned using the GASS .NET bindings, but I’ve written my own that I’ve been using. There is also a small bit of code that I’ve been including in my apps that checks for the presence of the driver DLL, and only uses the CUDA-accelerated code if it is present. If anyone else is doing .NET development, I’ll be glad to post that code, since it makes it quite easy to write an app that is CUDA-accelerated (but seamlessly falls back to non-CUDA code if there is no hardware present).

What we need is automatic emulation :angry:

I would say not emulation but rather automatic GPU code/CPU code dispatcher. Emulation is too slow to be used in production software, but if nvcc will ever be able to generate efficient CPU code dispatching may be a good option.

This is not true. Reading the documentation really helps, MSDN may be bad but not that bad:

“To enable or disable error messages displayed by the loader during DLL loads, use the SetErrorMode function.”

SetErrorMode is explained here:

No idea which flags exactly you need to use though.

If you use threads keep in mind that this function is not thread-safe at all.

This information is greatly appreciated. :thumbup:

Oops, sorry for that. And thanks for pointing that out.
I haven’t used LoadLibrary() for a while and probably forgot something. In fact, I’ve never had to use SetErrorMode() as default behaviour was acceptable.

That method is fine if you’re distributing an application - in which case you can copy cudart.dll in the same directory as the application / external DLL.

But if you’re distributing a library, there’s no way to know the final application directory. You have to copy cudart.dll in the application path (or, worse, in windows\system32) which is against the nVidia recommendations for redistributing their dll’s. The only way we could have forced to load cudart.dll in a specific directory would have been through delay-load and that doesn’t work for the runtime API.

I wish nVidia could provide some feedback on this issue…