One GPU NOT capable of Peer-to-Peer (P2P)

I would suggest filing a bug at developer.nvidia.com
I would include in the report the observation that P2P works when both GPUs are in TCC mode.
Would be a good idea to link to this thread also.

FYI: Bug ID 2443916

Update: No real luck from the bug team.

  • I got a bit of a confusing message back.
  • At first they claim to have reproduced the issue.
  • Second they claim that the sample code, "simpleP2P" has a bug, but that the driver itself is OK.
  • When I go to use the driver, the API to check if P2P is available returns false, and the enable P2P API also returns an error, "not supported".
  • Using NSIGHT I confirmed that the memCopy was not P2P but was indeed going via the Host.
  • Another oddity, I used the cuda events to record the memcopy API call duration and I was seeing times of around 98ms for 1GB.
  • By using NSIGHT you can see that the memcopy call ends up being asynchronous and the event recorder only captured the device-to-host part of the memcopy; There was another ~98ms for the host-to-device that was not accounted for - another bug?
  • Surprisingly, this ~200ms is quite fast for having to go via host and both GPUs being in WDDM mode.
  • If I perform the same cudaMemcpy from one of the P2000 GPUs (WDDM) to the M4000 GPU (WDDM) to which the monitor is connected, I see cudaMemcpy times of ~600ms.

My take-away, the current version of CUDA10 does not support P2P between WDDM GPUs.