Kernel launches/synchronization

I’m a bit confused.

Kernel launches are asynchronous, right? So after a kernel has been launched, CPU code runs without waiting for the kernel to finish.

But kernels can’t run in parallel…So if I having the following code:

kernelOne<<<1,1>>>();

kernelTwo<<<1,1>>>();

printf("Kernels are done!\n");

kernelTwo will block? Or will it run the printf, but inside the GPU, kernelTwo is waiting for kernelOne to finish?

The latter. Subsequent launches get queued (up to a point, at which time launches will block).