Until now, I alway use Runtime API to deploy my programs.
Does Driver API will faster than Runtime API?
Thank your quick answer. :)
Does the driver API use less CPU usage? It seems like the runtime API requires a dedicated CPU core.
It doesn’t require a core, merely a separate host thread for every GPU you use (in multi GPU apps). AFAIK this is because Runtime API abstracts handling the GPU’s context in a way that ties it to the host’s thread context. In Driver API, you explicitly handle the GPU context and you have more freedom with it, for example you can try to juggle contexts of multiple GPUs within a single host thread.
If you wanted speed, you’d use separate asynchronous host threads for each GPU anyway.
I use a seperate thread for my gpu computing, but it still completely eats up a CPU core. Is there a way to make it entirely asynchronous?
Take a look at context creation flags, CU_CTX_SCHED_SPIN and CU_CTX_SCHED_YIELD. They control how driver behaves when waiting for sync with device (this syncronization is what eats your CPU).