About CUDA portability

Hello all!

I have a few doubts concerning CUDA’s portability.

I’m developing an application which I’d like to run on different machines. The thing is, to which point can I ship my CUDA application without also shipping nvcc and the other C/C++ compilers? Is it possible to just ship an executable and CUDA’s driver? (I mean, CUDA’s driver is always needed to run the application on the GPU right?)

If you write your code to use the lower-level driver API, all you need to ship is the executable (and make sure your users have up-to-date drivers installed). If you want to use the higher-level runtime API, you need to distribute the runtime DLL (cudart.dll) along with your executable.

Other than that, they don’t need to have nvcc or anything else installed to make it work.


Now, if I want to load a PTX at runtime, I’d need nvcc right? As I understand, when you load the PTX it still needs to be compiled…

Nope, you can load PTX at runtime through the driver API methods cuModuleLoad and cuModuleLoadData (depending on how you want to access the PTX files).

The zipped cudart.so weighs in at only 85k whereas the whole driver package weighs in at 14352k … What evil did the little runtime do to not just get installed along with the driver and friends?

I still need to have nvcc to generate the PTX or is there another way to generate the kernel in runtime like in OpenCL?