GPU communication with the CPU

psehgal · March 9, 2009, 3:24pm

How does the GPU inform the CPU, that the kernel execution is over? Does it send some interrupt ?

How does the GPU inform the host (CPU) about the result - say of a matrix multiplication?
Can this only be achieved through cudaMemCpy() or are there other ways too?

Thanks,
Priya

YDD · March 9, 2009, 4:44pm

I’m not sure of the exact mechanism, but the easiest way to make sure your kernel is complete is to call [font=“Courier New”]cudaThreadSynchronize() [/font]which doesn’t return until the GPU has finished all the work assigned to it. If you’re using the asynchronous API, then you can use [font=“Courier New”]cudaStreamSynchronize()[/font] to check on a particular stream. Note that there’s no need to use these for simple programs - the CUDA functions check for synchronisation themselves (e.g. [font=“Courier New”]cudaMemcpy() [/font]has an implicit [font=“Courier New”]cudaThreadSynchronize() [/font]).

Yes. The only way to get results back from the GPU is to copy them back to host memory.

psehgal · March 9, 2009, 7:42pm

So, cudaThreadSynchronize() and cudaStreamSynchronize() and cudaMemcpy() kind of block the host process?
But, can other applications still keep running on the CPU? I hope they can.

MisterAnderson42 · March 9, 2009, 8:46pm

Of course other apps can keep running on the CPU. The OS scheduler is still running…

However, cudaThreadSynchronize() and related calls do spin-wait at 100% utilization.

If you need a yield, see [url=“http://forums.nvidia.com/index.php?s=&showtopic=83284&view=findpost&p=472418”]http://forums.nvidia.com/index.php?s=&...st&p=472418[/url]

If you need your process to yield even more severely, you can make your own loop with cudaEventQuery()