Since the driver API is more complex than runtime API,then,who can tell me its value or advantage?
you can compile your .cu into ptx or binary code, and then load them from host code.
That means you just care about how to compile your host codes.
For example, if you use Visual Studio, you may get a little annoyed when you cannot compile .cu file
by just pressing “build” button.
Moreover if you modify binary code directly, then you can not use runtime API directly.
At this time, driver API is more convenient.
for example, I modify binary code for SGEMM (see http://forums.nvidia.com/index.php?showtopic=159033),
and use driver API to load .cubin file into my host code.
We use it to provide an open-source implementation of the cuda runtime and there are several other projects that benefit from having a very low level interface to nvidia GPUs. However, your implicit suggestion that it is not useful for the majority of users is probably correct, just keep in mind that it probably was never intended to be either.
As LS Chien mentioned, direct loading cubin file from host code is useful in modulization point of view.
Specially, this structure allows focusing on kernel separate from driving the kernel.
Once the driver specify memory communication structure with kernel, we’ve only to write kernel in separate file
and manage it into cubin binary file.
Because the host code is a purely c-code, it is good to bind it dynamically with other language. Pycuda is one of the
examples. It lets people code kernel with Cuda grammar, other than that, people use Python for the host code.
Thanks for that developing high level code is possible for the host code. Also kernel code generation is independently
doable from host code.
Using runtime API make the cu file where CudaAPI is used compiles to cubin file. This increases the number of cubin files.
As posted above, its not clean for managing kernel code and API c-code together in mixed form as cubin file.
Thank you very much!