Running streams parallel with the host functions

tajiknomi · November 15, 2018, 9:08am

I have wrote a program which has two streams. Both streams operate on some data and write results in the form of flags on the host memory.
Here is the generic structure of how i am doing this:

loop {
AsyncCpy(....HostToDevice,Stream1);
AsyncCpy(....HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory */
AsyncCpy(....DeviceToHost,Stream1);  
AsyncCpy(....DeviceToHost,Stream2);  
}

I want to do some work on the CPU once i know that StreamX has finished copying the results back to the host memory. At the same time, i don’t want to stop the loop from executing Async operations (memcpy or kernel execution).

If i insert my host functions, let say host_ftn1(…) and host_ftn2(…) like this

loop {
AsyncCpy(....HostToDevice,Stream1);
AsyncCpy(....HostToDevice,Stream2);

Kernel<<<...,Stream1>>>
Kernel<<<...,Stream2>>>

/* Write the results on the host memory to be processed by host_ftn1(..) */
AsyncCpy(....DeviceToHost,Stream1);
/* Write the results on the host memory to be processed by host_ftn2(..) */
AsyncCpy(....DeviceToHost,Stream2);  

if(Stream1 results are copied to host)
       host_ftn1(..);
if(Stream2 results are copied to host)
       host_ftn2(..);
}

It will stop the execution of loop until it finishes the execution of host functions i.e. host_ftn1 and host_ftn2, but I don’t want to stop the execution of GPU instructions i.e. AsyncCpy(…) and Kernel<<<…,StreamX>>> while the CPU is executing host functions.

Any solution/approach regarding this problem

Robert_Crovella · November 15, 2018, 12:29pm

investigate stream callbacks

it’s documented in the CUDA C programming guide, and there are sample codes as well

tajiknomi · November 19, 2018, 5:53am

I have read stream callbacks and it seems like it will work in my case. But there is one problem.

The cudaStreamAddCallback layout let me pass only a single data pointer

__host__ cudaError_t cudaStreamAddCallback ( cudaStream_t stream, cudaStreamCallback_t callback, void* userData, unsigned int  flags )

But in my case there are various host variables which i want to pass to the callback function. One solution could be declaring all those variables global, but that would make my code messy.

Any solution ?

Robert_Crovella · November 19, 2018, 6:13am

userdata could be a pointer to a struct of pointers to whatever data you want

study some of the cuda sample codes for cuda callbacks, or just study some codes that use pthreads

tajiknomi · November 19, 2018, 6:23am

Pointer-to-struct will work.

Though callbacks section is very little documented in Cuda C programming guide. There is only one sample code given.

tajiknomi · November 27, 2018, 7:45am

I have tested stream callbacks for two streams in my program and it worked.
thank you :)

Topic		Replies	Views
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49539	January 20, 2011
Streams and CPU CUDA Programming and Performance	1	1100	September 27, 2013
I want to synchronize CUDA streams CUDA Programming and Performance	5	1048	January 5, 2024
Questions on Streams CUDA Programming and Performance	5	2247	July 16, 2008
cudaLaunchHostFunc API example CUDA Programming and Performance	31	7147	February 8, 2025
a question about the asynchronous mechanism and stream CUDA Programming and Performance	3	1951	December 10, 2008
host streams CUDA Programming and Performance	11	1631	January 2, 2015
Using cudaEvents to synchronise with cudaStreamCallback CUDA Programming and Performance cuda	5	1169	May 9, 2024
cudaLaunchHostFunc requires cudaStreamSynchronize CUDA Programming and Performance	2	419	January 28, 2024
Continuously send data from host to device CUDA Programming and Performance	2	2194	May 26, 2009

Running streams parallel with the host functions

Related topics