cuda host device Question

I’ve just started to read about CUDA and have got a question, how the controll exchange between gpu and cpu is done.
So far, I came to know, that by
functioname<<<dimGrid, dimBlock>>>(parameterlist);
you can handle over from cpu to gpu and the cpu will wait till the global function terminates.
But is the cpu notified, that the gpu computation is finished?
And what happens while computation is running on the gpu? Is the cpu blocked for that period or will it schedule other tasks?


Kernel launches are non blocking. Once the kernel is launched, the CPU is free to do other things. The cudaThreadSynchronize() call can be used to force the host to wait until the GPU has finished running a kernel.

what exactly is a device? Somehow I couldn’t figure out if by device the whole GPU is meant or just a single streamprocessor or multiprocessor?

“Device” means the whole GPU.