In the Cuda C programming guide, rules for simultaneous stream execution are given in one section (184.108.40.206.3)… And then specific cases are talked about in the next section (220.127.116.11.4) that seem to violate the rules that were just given.
The basic case is:
a) Stream memcpy host to device
b) Stream execute a kernel
c) Stream memcpy device to host
d) Stream memcpy host to device
e) Stream execute a kernel
f) Stream memcpy device to host
The docs claim that b) and d) can’t execute simultaneously because d) is executed after c), yet in the previous section (18.104.22.168.3), only allocation/writes to device memory (NOT host memory, other than page-locked allocations) are mentioned as operations causing implicit synchronization.
How can these two sections be reconciled?