Hello everyone, is possible to continue to run code in main() in parallel with a previous kernel call without create another CPU thread or the myKernel<<<>>>() call will only return after all GPU threads are done?
Thanks.
Hello everyone, is possible to continue to run code in main() in parallel with a previous kernel call without create another CPU thread or the myKernel<<<>>>() call will only return after all GPU threads are done?
Thanks.
kernel<<<…>>>(…) denotes an asynchronous kernel launch–you can block until it’s completed with cudaThreadSynchronize().
is there a problem if i call cudaThreadSynchronize() after all the threads are done? Because my project involves some file processing, then while the kernel run i prepare the next pass and this certainly will consume more time.
ex:
while(!dataend)
{
mykernel<<<…>>>(…); //Call the kernel to process previous data
loaddata(); //while the kernel works, load more data
cudaThreadSynchronize();
movedatatodevice();
}
With that code, the GPU will continue on the kernel as your CPU code is running loaddata() since it’s an asynchronous call.
You can call cudaThreadSynchronize() as much as you want, regardless of whether or not the threads are already done, and it will sync them when you call it.
Thank you!
do note that cuda is “smart” so that you actually never need to use cudaThreadSynchronize(), except when you want to time a run on the gpu.