Question about streams

boricworld · August 5, 2009, 9:59pm

Good afternoon everyone,

I have a question about the CUDA streams.

Suppose I have the following codelet,

[codebox]

…

cudaStream_t s;

…

some_kernel<<grid, block, 0, s>>;

…

cudaMemcpyAsync(dst, src, memSize, cudaMemcpyHostToDevice, 0);

…

[/codebox]

Basically I launch the kernel with a stream other than the default stream, and I run the cudaMemcpyAsync() with the default stream, 0.

Does it mean that the cudaMemcpyAsync() cannot finish (i.e., the data has been copied to the device) until the kernel has completed, since the default stream is used in copying?

Thanks,

B

boricworld · August 6, 2009, 12:46am

I found the following from the programming guide 2.3.

Two commands from different streams cannot run concurrently if either a pagelocked
host memory allocation, a device memory allocation, a device memory set, a
device â†” device memory copy, or any CUDA command to stream 0 is called in between
them by the host thread.

I think this may answer the question I raised above. Can anyone from Nvidia confirm this? Tim? ^_^

Topic		Replies	Views
Questions about STREAM CUDA Programming and Performance	0	538	November 22, 2011
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1749	October 17, 2009
Multiple kernels concurrency problems + MemcpyToArrayAsync() incorrect stream CUDA Programming and Performance	0	571	June 18, 2013
Asynchronicity of kernel execution and cuMemcpy CUDA Programming and Performance	2	3274	March 23, 2009
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1761	June 23, 2010
I want to synchronize CUDA streams CUDA Programming and Performance	5	683	January 5, 2024
Overlap cudaMemcpyAsync and kernel CUDA Programming and Performance	1	503	February 10, 2021
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	985	December 15, 2022
About Stream control CUDA Programming and Performance	1	939	March 26, 2009
Help with CUDA streams CUDA Programming and Performance	1	1599	April 2, 2010

Question about streams

Two commands from different streams cannot run concurrently if either a pagelocked host memory allocation, a device memory allocation, a device memory set, a device â†” device memory copy, or any CUDA command to stream 0 is called in between them by the host thread.

Related topics

Two commands from different streams cannot run concurrently if either a pagelocked
host memory allocation, a device memory allocation, a device memory set, a
device â†” device memory copy, or any CUDA command to stream 0 is called in between
them by the host thread.