Asynchronous Concurrent Execution

Using Asyncrhronous Concurrent Execution of CUDA, I want to create a overlapped exectuion process between a GPU and a CPU. That is I want to create to streams using which I can make a non-blocking GPU kernel call using the first one and the second one for calling a function which I wish to execute on the CPU. Can somebody please help me out on how to go about it…

The Programmer’s Guide only provide that description where using two streams two GPU kernel calls can be made. Can any one please help me regarding how o code the thing that I just stated above …

You don’t need streams for that - kernel launches are already non-blocking and fully asynchronous to the host. If you have code like this:

<<<grid,block,shm>>>kernel()

hostWork()

cudaThreadSynchronize()

the hostWork() call will naturally overlap with the kernel.

Agreed… are u referring to kernel() as the GPU kernel or the CPU one ?? If hostWork() is the CPU fucntion then will it really execute in a nonblocking fashion ?? Coz… as far as my knowledge… all GPU kernel calls are blocking… and subsequent host codes execute only once the control returns from the GPU kernel… Please enlighten me on this…

Thanks for the help…

No, but it won’t be blocked by the kernel launch above it.

.

No that is not how kernel launches work, they are non-blocking.

I thought I already had…