GPU communication with the CPU

How does the GPU inform the CPU, that the kernel execution is over? Does it send some interrupt ?

How does the GPU inform the host (CPU) about the result - say of a matrix multiplication?
Can this only be achieved through cudaMemCpy() or are there other ways too?


I’m not sure of the exact mechanism, but the easiest way to make sure your kernel is complete is to call [font=“Courier New”]cudaThreadSynchronize() [/font]which doesn’t return until the GPU has finished all the work assigned to it. If you’re using the asynchronous API, then you can use [font=“Courier New”]cudaStreamSynchronize()[/font] to check on a particular stream. Note that there’s no need to use these for simple programs - the CUDA functions check for synchronisation themselves (e.g. [font=“Courier New”]cudaMemcpy() [/font]has an implicit [font=“Courier New”]cudaThreadSynchronize() [/font]).

Yes. The only way to get results back from the GPU is to copy them back to host memory.

So, cudaThreadSynchronize() and cudaStreamSynchronize() and cudaMemcpy() kind of block the host process?
But, can other applications still keep running on the CPU? I hope they can.

Of course other apps can keep running on the CPU. The OS scheduler is still running…

However, cudaThreadSynchronize() and related calls do spin-wait at 100% utilization.

If you need a yield, see…st&p=472418

If you need your process to yield even more severely, you can make your own loop with cudaEventQuery()