If my understading is right, concurret kernel function calls are serialized in device.
In the CUDA programmiong guide (section 188.8.131.52), it say that “Control is returned to the application before the device has completed the requested task”.
My question is that…
If multiple host threads call a kernel or differect kernel functions, what happen?
I can think two cases…
- All kernels return control to the host threads at same time.
- One kernel start to run and return control but other host threads shoue wait until the executed kernel function is finished.
Anyone who have experience like this situation?