Deployment deploying CUDA

circle · July 5, 2008, 7:45pm

Hi
I’m wondering if it’s possible to run CUDA powered applications on machines that doesn’t have the CUDA driver / toolkit / SDK, basically what are the deployment options? And are there any differences between the runtime API and the Driver API in this aspect?

THX
Dan Lavigne

AndreiB · July 6, 2008, 3:34pm

With Driver API you do not depend on anything except for driver (nvcuda.dll). With Runtime API you have to redistribute cudart.dll (and you need to check that cudart.dll is of supported version).

If you need to run your program on machines without CUDA driver then you have to mark nvcuda.dll as delay-loaded. More etails here:
[url=“http://forums.nvidia.com/index.php?showtopic=71602”]http://forums.nvidia.com/index.php?showtopic=71602[/url]

circle · July 6, 2008, 4:35pm

thanks for the info!

bemobali · October 31, 2008, 11:49pm

Does the CUDA toolkit license allow such a redistribution?

alex_dubinsky · November 1, 2008, 3:07am

Technically, no. But the NVIDIA guys here say you can do it. It’s kind of dumb.

bemobali · November 1, 2008, 8:07pm

Thanks Alex. I guess the safest route is to use the driver API.

tmurray · November 1, 2008, 8:58pm

you can redistribute the runtime API and redistribute cudart.dll. really, I asked Ian Buck this point-blank and he said “yes, it is okay for anyone who wants to redistribute cudart.dll to redistribute cudart.dll.” or libcudart or what have you.

hopefully this will cease to be a problem in a few months, but for the time being if you want to use cudart you can redistribute it. however, the driver API is still a much better option because it’s basically guaranteed to always work and avoid DLL problems.

alex_dubinsky · November 2, 2008, 4:38am

Yeah, but the runtime api is a lot more elegant, which is good in terms of clean and maintable code. I wonder, has someone maybe made a c++ wrapper around the driver api that aimed for ease of use? Or do we have to roll our own?

alex_dubinsky · November 2, 2008, 5:19am

Does the driver api support loading kernels as ptx files? Or does the program need to use architecture-specific cubins, and update those every time a new generation of hardware is not backward-compatible at a binary level? That is not very robust. One of the nice things about the runtime api is that it stores the ptx (which, albeit, is tied to a bytecode version such as sm_10/sm_11/etc) alongside the compiled machine code (tied to hw version cm_10/cm_11/etc). The runtime api is able to JIT the ptx byte code into machine code on the fly, if the right compiled version has not been stored.

Unfortunately, this advantage of the runtime api is sort of moot right now. To update the JITer for new archs you have to, presumably, update cudart.dll. Yet, since the cudart.dll you use is local to your application, in essence you still have to update your application.

So as it stands, in order for your application to not be guaranteed to break as soon as NVIDIA makes a significant-enough change to the microarchitecture, you need the client to have the latest CUDA Toolkit installed. If you’re using the runtime api, you need the latest cudart.dll, and if you’re using the driver api, you need ptxas.exe.

tmurray · November 2, 2008, 8:11am

if you read the nvcc documentation, you’ll see that it is possible for cubins to contain ptx.

in addition, I believe any JIT stuff is contained in the driver, not in cudart.dll. in addition, you must use the cudart.dll present during compilation and not a newer one, which is why you have to redistribute cudart.dll instead of just saying “install the toolkit.”

alex_dubinsky · November 2, 2008, 4:06pm

However, this is not the default behavior, correct?

In any case, I tried to generate such a cubin with this line:

nvcc -cubin --gpu-code=compute_10 […]

and received the following message:

nvcc fatal   : Option '-cubin' is not allowed when compiling for a virtual compute architecture

alex_dubinsky · November 2, 2008, 4:17pm

Ok, I’ve figured out how to get robust JITing of PTX through the Driver API. You need to load not CUBINs, but FATBINs. Page 20 of nvcc_2.0.pdf illustrates how a fatbin combines multiple versions of cubins and ptx. However, the CUDA reference guide has this to say, in its reference section for Driver API function cuModuleLoadFatBinary():

So… until fatbins become available, using the Driver API right now is a guarantee of incompatibility. But you’re right, tmurray, cudart.dll is not necessary for JITing and shouldn’t need updating.

tmurray · November 4, 2008, 7:07am

Just so you know, I confirmed today that we’re updating the EULA (I am hoping that it will be in the 2.1 beta, but it’s definitely going to be in 2.1 final) to clarify once and for all that yes, you can redistribute CUDART/CUBLAS/CUFFT dynamic libraries.

Topic		Replies	Views
Redistributing DLL with CUDA CUDA Programming and Performance	4	16046	September 26, 2008
About CUDA portability CUDA Programming and Performance	5	5094	October 26, 2009
Is it possible to program with CUDA in dll files? CUDA Programming and Performance	5	3190	December 2, 2008
CUDA toolkit installer CUDA Programming and Performance	7	17813	February 6, 2008
Delay load cuda runtime dll (cudart.dll) CUDA Programming and Performance	8	12512	May 28, 2009
Linking CUDA Staticly CUDA Programming and Performance	12	10084	September 12, 2008
CUDA application deployment What is correct deployment CUDA Programming and Performance	8	18074	March 12, 2010
Distributing CUDA program without driver(s) Easily distribute CUDA created program to non computer-s CUDA Programming and Performance	4	2169	February 20, 2009
CUDA DLL dependencies CUDA Programming and Performance	11	14239	September 10, 2008
Some questions on distributing CUDA-based APs CUDA Programming and Performance	3	2032	March 13, 2008

Deployment deploying CUDA

Related topics