Implicit Synchronization

xiay · May 15, 2018, 9:51pm

I have some questions over the following paragraph from Programming Guide:

"Two commands from different streams cannot run concurrently if any one of the following operations is issued in-between them by the host thread:

a page-locked host memory allocation,
a device memory allocation,
a device memory set,
a memory copy between two addresses to the same device memory,
any CUDA command to the NULL stream,
a switch between the L1/shared memory configurations described in Compute Capability 3.x and Compute Capability 7.x."

What exactly is the behavior when “Two commands from different streams cannot run concurrently”?

Are they serialized with the in-between command?

If so, do they also become synchronous with respect to host?

What exactly does “a memory copy between two addresses to the same device memory” mean? “Device memory” same with who?

Robert_Crovella · May 15, 2018, 10:25pm

Two commands (CUDA operations) that cannot run concurrently are serialized. Rather than operation A and operation B executing at the same time, first one will execute, then the other. It doesn’t necessarily mean that they become synchronous with the host. For example, a cudaMalloc operation (“a device memory allocation”) issued between 2 kernel calls, even if the kernel calls are issued to separate streams, will prevent those 2 kernels from executing concurrently.
This is referring to a memory copy between two device memory addresses on the same device. i.e the same GPU. For example a cudaMemcpy operation where the direction token is cudaMemcpyDeviceToDevice, and both supplied pointers (source and destination) refer to locations on the same device.

xiay · May 15, 2018, 11:57pm

Thanks for the clarification!

Topic		Replies	Views
Understanding Streams I'm confused. :( CUDA Programming and Performance	2	749	May 2, 2011
Question about streams CUDA Programming and Performance	1	997	August 6, 2009
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1758	October 17, 2009
Some questions regarding concurrency CUDA Programming and Performance	0	920	May 25, 2010
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2981	December 18, 2008
Asynchronicity of kernel execution and cuMemcpy CUDA Programming and Performance	2	3298	March 23, 2009
concurrency of device to device copy CUDA Programming and Performance	0	613	December 17, 2012
cuda stream CUDA Programming and Performance	3	5837	April 6, 2011
accessing device memory during kernel execution CUDA Programming and Performance	3	1547	March 10, 2010
Do kernels/streams execute concurrently? CUDA Programming and Performance	1	1197	October 15, 2008

Implicit Synchronization

Related topics