How to build CUDA executables to run on systems without CUDA runtime installed

If I want to compile a binary for a friend who has a system with a CUDA capable GPU, how can I compile it so that he doesn’t have to go through the hassle of installing the whole CUDA framework to make it run on his system?

I have discovered that when you try to run a CUDA binary on a system with a CUDA capable GPU but no CUDA runtimes installed, you get the error message that cudart64_75.dll is missing. Can I just provide the required dlls with the binary and just make sure it is in the same path as the binary?

If you are building against CUDA 7.5 then your friend will need a CUDA 7.5 capable GPU driver installed (i.e. for CUDA 7.5, a 352.xx or newer driver on their GPU).

If the above condition is satisfied, there are several approaches:

  1. Build a driver API app. I say this for completeness, not as a legitimate suggestion. Presumably you have a CUDA runtime API app, so see the next suggestions. I’m not suggesting you should convert a runtime API app to a driver API app.

  2. You could “redistribute” the dynamically-linked CUDA libraries needed by your app. If it is just using the CUDA runtime API, you should be able to redistribute just the necessary cudartXX_YY.dll. If you were also using CUBLAS, for example, then you would need to bundle both the cublas____.dll and cudart____.dll libraries. The legality of redistribution of these libraries is covered in the CUDA EULA:

  1. You could statically link against the necessary libraries. How to statically link against a library in general is not unique to CUDA, but you can find examples of project setups in various cuda sample projects that demonstrate how to do it for e.g. the cudart (cuda run time) library. Then there should be nothing to bundle with your app.

Note that none of this obviates the need to have a properly installed GPU driver on your friend’s system, and that driver must be compatible with whatever version of CUDA you built the app against.

For a while I was confused as to where to find those CUDA runtime dlls but thanks to the (non-standard) MS-DOS command ‘which’ I quickly found it in ‘:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\ … \bin’. After supplying the required CUDA dlls, it stopped complaining about that.

But then it complained about missing ‘msvc120.dll’, I know how to configure Visual Studio to build “independent” binaries on regular C/C++ projects but not on CUDA projects. In regular projects you find the setting under C/C++ -> Code Generation as discussed here:

But how do I do it for CUDA projects? I cannot find similar options for CUDA projects and there is no ‘C/C++’ section. I suppose I could compile directly with nvcc although I’m not sure that it helps. But nvcc also fails to compile .cu files that contain calls to GetAsyncKeyState() for some strange reason.

When I supplied the required dlls it first complained that the driver was not sufficient. I saw that it was 350.xx so I decided to upgrade the driver. I upgraded to 361.43 and tried to run the CUDA binary again. But when trying to run the binary it just freezes to such a point that I cannot even kill it with the Task Manager. What is happening? I read somewhere that CUDA doesn’t work well with Nvidia Drivers newer than 354.xx, is this the issue? I have not yet updated my Nvidia drivers on my developer system.

Can you supply a reference? What kind of problems? What exactly does “newer than 354.xx” mean? I am running CUDA 7.5 with driver 354.56 on 64-bit Windows 7 Professional which seems to be the latest WHQL driver for my GPU (Quadro K2200) and have not noticed any issues.

AFAIK the /MT or /MD settings are in Project Properties…Configuration Properties…CUDA C/C++…Host…Runtime Library. However I have not actually tried changing them to prove I could build a standalone executable. If you have a project that consists of both .cu files and .cpp files, you will want to make sure this setting is consistent through the various compilation phases.

I don’t know what is happening in your case, but there shouldn’t be any compatibility issues with CUDA 7.5 and drivers that are 352.xx or newer, including any of the latest drivers currently published on

I have played around with this for quite a bit. In my case, only cudart64_75.dll is needed to make it shut up. But still, it doesn’t run on a system that doesn’t have CUDA runtimes installed. What happens is that the program halts at the bits where CUDA initializations should be invoked and freezes to such a point that the program cannot be killed with the Windows task manager, nor can it be closed with the windows close button, its unbelievable that a program can jack into an operating system like that, and after a few minutes or so, the entire computer reboots for no apparent reason.

Also on my development system I frequently get bluescreens caused by “nvlddmkm.sys” AFTER running CUDA programs, i.e. it doesn’t appear that the load from the CUDA programs trips the driver, but rather something that happens maybe 10-15 minutes or so afterwards…

It’s difficult for me to give a reference as to where I’ve read about the update. In general, I’m wary of jumping on the latest update as it arrives, and when I read such a thing on forums I stay away from it altogether. Nvidia don’t seem to be very good at reporting what bug fixes they have applied with their successive updates so you will never know whether issue X or Y ever was fixed.

Perhaps it is time for another bug report then?

What exactly does “properly installed GPU driver, compatible with CUDA version X.Y” mean?

I tried to get this working correctly several times with no success. I have a simple CUDA/C++ program compiled with CMake on Windows. My friend eventually installed the full 3 GB package for 10.2 as I could not figure out how to make my binary work on his machine. Then I upgraded CUDA to version 11 on my box and gave him the two dlls needed by the application (cudart64_110.dll, cufft64_10.dll), hoping that it would work out of the box. The app could be started (the dlls are found), but as soon as the first functions are called, it breaks. I got it working after the friend installed the full CUDA 11 framework, but that sounds like an overkill. I fail to find the recipe for making this work properly without having to install a 3 GB package every few months.