I have some questions over the following paragraph from Programming Guide:
"Two commands from different streams cannot run concurrently if any one of the following operations is issued in-between them by the host thread:
a page-locked host memory allocation,
a device memory allocation,
a device memory set,
a memory copy between two addresses to the same device memory,
any CUDA command to the NULL stream,
a switch between the L1/shared memory configurations described in Compute Capability 3.x and Compute Capability 7.x."
- What exactly is the behavior when “Two commands from different streams cannot run concurrently”?
Are they serialized with the in-between command?
If so, do they also become synchronous with respect to host?
- What exactly does “a memory copy between two addresses to the same device memory” mean? “Device memory” same with who?