I find this puzzling.
In the documentation for cuMemcpyDtoH() it is stated:
This function exhibits synchronous behavior for most use cases.
So… how do I know what my use case falls under? It is not further specified what is meant by that!
I find this puzzling.
In the documentation for cuMemcpyDtoH() it is stated:
This function exhibits synchronous behavior for most use cases.
So… how do I know what my use case falls under? It is not further specified what is meant by that!
The word “synchronous” in that part of the documentation is a link. Have you clicked on it? The enumeration at the link seems exhaustive.
I would argue that this sentence should not be part of the cuMemcpyDtoH()
documentation, because there are no asynchronous cases for device->host transfers. It does, however, apply to host->device transfers.
I will let the NVIDIA folks decide whether the documentation needs updating, or whether there are in fact asynchronous cases for device->host transfers that are not enumerated.