How do CPU threads know that GPU kernel is finished?

I am wondering about how do CPU threads know that GPU kernel is finished.

CPU threads transferring data and calling kernel. and then if there is no work, CPU threads will be idle.

but when GPU kernel is finished, how do CPU threads know that?

I want to know that. because when GPU kernel is finished, the result synchronizes from device memory to main memory, and than transfer another data from main memory to device memory.

please help me.

A CPU thread doesn’t “know” when a kernel is finished. It can effectively ask the CUDA runtime if the GPU is finished by using a runtime API function such as cudaDeviceSynchronize, cudaStreamSynchronize, cudaMemcpy, cudaEventSynchronize, etc. Any of these calls may force the CPU thread to wait at the call until the GPU is finished/idle.