PtoP versus DtoD

I am working on different codes which utilize several NVIDIA K80 GPUs installed on a compute node. To speed up data transfers between GPUs I would like to use peer-to-peer transfers.

In the Visual Profiler (installed with CUDA 7.5) GUI I noticed that for some of these codes data transfers are marked as DtoD and in others as PtoP: which is the difference?

Thanks and Best Regards,


I found a possible answer… just as reference for other users:

The difference between D2D and P2P is probably related to the two different function calls:

cuMemcpyDtoDAsync and cuMemcpyPeerAsync

and thus it depends if the data transfer is within the same context or between multiple contexts:

but it is still not clear to me if performances are expected to change or not between using the same or multiple contexts…

Hi Benry,

Have you discovered the real difference between them?

The function cuMemcpyPeerAsync only works with Unified Memory (that is, CUDA Framework takes care the pointers).

I am not sure about cuMemcpyDtoDAsync really implies that it is doing P2P. I mean, it is necessary to call cudaDeviceEnablePeerAccess() in order to enable P2P. My guess is that if P2P is not enabled cuMemcpyDtoDAsync will not copy the memory with P2P.

Thanks and Best Regards,