How does the CPU know when a GPU kernel finishes?

bowuwm · September 25, 2014, 3:11am

I’m wondering how does CPU know when GPU kernel finishes. Are interrupts used? But I’m afraid then handling the interrupt takes even longer time than the kernel. Our investigation showed that after invoking a kernel, the CPU calls yield several times. Does this indicate that polling is used to know? Since we only see multiple yields, does it mean after several polls, it falls back to interrupts?

MutantJohn · September 25, 2014, 3:21am

It doesn’t unless you use cudaDeviceSyncrhonize() or some other form of flagging, I think.

Jimmy_Pettersson · September 25, 2014, 8:03am

You might be interested in looking into the callbacks:

3.2.5.5.6. Callbacks
The runtime provides a way to insert a callback at any point into a stream via
cudaStreamAddCallback(). A callback is a function that is executed on the host once
all commands issued to the stream before the callback have completed. Callbacks in
stream 0 are executed once all preceding tasks and commands issued in all streams
before the callback have completed.

I believe the runtime API creates a new thread to execute the callback and does not cause an interrupt to the main application thread AFAIK…

bowuwm · September 25, 2014, 2:04pm

My understanding is that cudaStreamAddCallback() provides a way to call CPU functions once the GPU kernel finishes. But it doesn’t imply the mechanism that the GPU driver uses to know when the kernel finishes. My guess is polling, but I’m not exactly sure. I hope some NVIDIA people can let us know the answer.

Robert_Crovella · September 25, 2014, 2:24pm

The CPU/thread behavior at a barrier may vary. You may want to read:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g18074e885b4d89f5a0fe1beab589e0c8