One GPU NOT capable of Peer-to-Peer (P2P)

Robert_Crovella · November 15, 2018, 1:40am

I would suggest filing a bug at developer.nvidia.com
I would include in the report the observation that P2P works when both GPUs are in TCC mode.
Would be a good idea to link to this thread also.

nunez.juan · November 15, 2018, 8:54pm

FYI: Bug ID 2443916

nunez.juan · November 27, 2018, 4:29pm

Update: No real luck from the bug team.

I got a bit of a confusing message back.
At first they claim to have reproduced the issue.
Second they claim that the sample code, "simpleP2P" has a bug, but that the driver itself is OK.
When I go to use the driver, the API to check if P2P is available returns false, and the enable P2P API also returns an error, "not supported".
Using NSIGHT I confirmed that the memCopy was not P2P but was indeed going via the Host.
Another oddity, I used the cuda events to record the memcopy API call duration and I was seeing times of around 98ms for 1GB.
By using NSIGHT you can see that the memcopy call ends up being asynchronous and the event recorder only captured the device-to-host part of the memcopy; There was another ~98ms for the host-to-device that was not accounted for - another bug?
Surprisingly, this ~200ms is quite fast for having to go via host and both GPUs being in WDDM mode.
If I perform the same cudaMemcpy from one of the P2000 GPUs (WDDM) to the M4000 GPU (WDDM) to which the monitor is connected, I see cudaMemcpy times of ~600ms.

My take-away, the current version of CUDA10 does not support P2P between WDDM GPUs.