I have wrote a program which has two streams. Both streams operate on some data and write results in the form of flags on the host memory.
Here is the generic structure of how i am doing this:
loop {
AsyncCpy(....HostToDevice,Stream1);
AsyncCpy(....HostToDevice,Stream2);
Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>
/* Write the results on the host memory */
AsyncCpy(....DeviceToHost,Stream1);
AsyncCpy(....DeviceToHost,Stream2);
}
I want to do some work on the CPU once i know that StreamX has finished copying the results back to the host memory. At the same time, i don’t want to stop the loop from executing Async operations (memcpy or kernel execution).
If i insert my host functions, let say host_ftn1(…) and host_ftn2(…) like this
loop {
AsyncCpy(....HostToDevice,Stream1);
AsyncCpy(....HostToDevice,Stream2);
Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>
/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);
if(Stream1 results are copied to host)
host_ftn1(..);
if(Stream2 results are copied to host)
host_ftn2(..);
}
It will stop the execution of loop until it finishes the execution of host functions i.e. host_ftn1 and host_ftn2, but I don’t want to stop the execution of GPU instructions i.e. AsyncCpy(…) and Kernel<<<…,StreamX>>> while the CPU is executing host functions.
Any solution/approach regarding this problem