Using Asyncrhronous Concurrent Execution of CUDA, I want to create a overlapped exectuion process between a GPU and a CPU. That is I want to create to streams using which I can make a non-blocking GPU kernel call using the first one and the second one for calling a function which I wish to execute on the CPU. Can somebody please help me out on how to go about it…
The Programmer’s Guide only provide that description where using two streams two GPU kernel calls can be made. Can any one please help me regarding how o code the thing that I just stated above …
Agreed… are u referring to kernel() as the GPU kernel or the CPU one ?? If hostWork() is the CPU fucntion then will it really execute in a nonblocking fashion ?? Coz… as far as my knowledge… all GPU kernel calls are blocking… and subsequent host codes execute only once the control returns from the GPU kernel… Please enlighten me on this…