Conflictin Streams documentation?


I am just learning how to use streams with CUDA, so I have been reading the documentation, but there appears to be some conflicts between different manuals.

Section 3.1.2 (page 17) of the CUDA C Best Practices Guide (version 3.2) states:

“On devices that have this capability, the overlap once again requires pinned host memory, and, in addition, the data transfer and kernel must use different, non-default streams (streams with non-zero stream IDs).”

Section (page 41) of the CUDA C Programming Guide (version 3.2)states:

“If the code is rewritten the following way (and assuming the device supports overlap of data transfer and kernel execution)

then the memory copy from host to device issued to stream 1 overlaps with the kernel launch issued to stream 0.”

If I understand the documentation correctly, the Best Practices Guide states that stream 0 cannot be used to overlap kernel execution and mem. copies, yet the Programming Guide states if the code is written as shown in the example, then a kernel launch issued to stream 0 will overlap with memory copies on another stream.

Both of these statements cannot be correct, can they?

If not, which is correct?

If they are both correct, please explain where I am going wrong in my interpretation.

Thanks for the help.