I am just learning how to use streams with CUDA, so I have been reading the documentation, but there appears to be some conflicts between different manuals.
Section 3.1.2 (page 17) of the CUDA C Best Practices Guide (version 3.2) states:
“On devices that have this capability, the overlap once again requires pinned host memory, and, in addition, the data transfer and kernel must use different, non-default streams (streams with non-zero stream IDs).”
Section 22.214.171.124.4 (page 41) of the CUDA C Programming Guide (version 3.2)states:
“If the code is rewritten the following way (and assuming the device supports overlap of data transfer and kernel execution)
then the memory copy from host to device issued to stream 1 overlaps with the kernel launch issued to stream 0.”
If I understand the documentation correctly, the Best Practices Guide states that stream 0 cannot be used to overlap kernel execution and mem. copies, yet the Programming Guide states if the code is written as shown in the example, then a kernel launch issued to stream 0 will overlap with memory copies on another stream.
Both of these statements cannot be correct, can they?
If not, which is correct?
If they are both correct, please explain where I am going wrong in my interpretation.
Thanks for the help.