Combination of "Overlap of Data Transfer" and "Concurrent Kernel Execution"

sarutake.nvforum · September 14, 2011, 6:02am

Hi.

I try to combine “Overlap of Data Transfer and Kernel Execution” with “Concurrent Kernel Execution” like the attached figure.

Data transfers from host to device overlap with kernel execution.
Kernel A, B and C are executed concurrently for each data.

But, I don’t know how to program it.

Associated cudaMemcpyAsync() and kernel-call need to be given a same stream when overlapping data transfer with kernel execution.
On the other hand, each kernel-call need to be given different streams to execute them concurrently.

So, Kernel A, B and C need to be given the same stream as cudaMemcpyAsync() of associated data.
But, they might not be executed concurrently if they are given a same stream.

How do I program what I want ?

sarutake.nvforum · September 14, 2011, 6:09am

Sorry, I forgot to attach the figure.

figure.bmp (581 KB)

Topic		Replies	Views
Concurrent execution problem Try to understand how to achieve the data and execution concurrency CUDA Programming and Performance	4	1550	July 9, 2010
Overlapping kernel execution and data transfer CUDA Programming and Performance	9	3541	May 10, 2017
Any method for time overlap? CUDA Programming and Performance	2	4548	April 13, 2009
Strange behavior with overlap of transfer and compute CUDA Programming and Performance	4	3988	October 19, 2011
Overlapping data transfers with kernel execution CUDA Programming and Performance	9	4619	March 13, 2009
Using streams... Howto? CUDA Programming and Performance	0	1127	July 25, 2008
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1816	June 23, 2010
streams not overlapping CUDA Programming and Performance	1	1584	May 23, 2011
Concurrent Kernel Execution / Memory Transfer We can't get it to work... CUDA Programming and Performance	5	4062	March 21, 2009
cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers CUDA Programming and Performance	2	5675	April 2, 2009

Combination of "Overlap of Data Transfer" and "Concurrent Kernel Execution"

Related topics