Kernel launches through global functions or cuLaunchGrid() and
cuLaunchGridAsync();
So this basically means that ever kernel call you make is asynchronous? I’m pretty positive it is, but one of my coworkers who’s been using CUDA for about a year(compared to my 2.5 weeks) seems to think that they aren’t…
No, it just means that there are several ways to call the kernel. My understanding is that kernels would only be executed asynchronously if launched via cuLaunchGridAsync(), or if you are using multiple streams.
What exactly do you mean by “asynchronous”? Kernel calls will never wait for the execution on the GPU to complete (which would be pointless, since for your CPU program there is no way to ever find out), functions like cudaMemcpy do wait for all kernels to complete though.
From a GPU point of view, kernels are always executed strictly one after the other, in the way you called them. In theory, cuda streams could allow for reordering or even parallel execution, but this is not implemented and might not even be possible with current hardware.
So I would describe this as kernel calls are executed asynchronously to anything you do on the CPU but synchronously relative to any other GPU code and memcpy and other functions synchronize the CPU to the GPU.
Ok, this is effectively what I thought happened. There was discussion here on whether or not you would need to start a new thread in your C program in order to do any CPU calculations while your GPU kernel was running. Obviously multiple kernels have to be synchronous, as you can’t run 2 kernels at once (essentially). Thanks for the response.