2080 Tis cudaDeviceCanAccessPeer failure without NVLink bridge

It appears that the 2080 Tis cannot peer without the NVLink bridge installed. Just want to confirm that this is the intended behavior and not a bug.

Testing the peering using the simpleP2P script from the CUDA samples

Without the bridge:

[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "GeForce RTX 2080 Ti" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "GeForce RTX 2080 Ti" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from GeForce RTX 2080 Ti (GPU0) -> GeForce RTX 2080 Ti (GPU1) : No
> Peer access from GeForce RTX 2080 Ti (GPU1) -> GeForce RTX 2080 Ti (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.

With the bridge:

[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "GeForce RTX 2080 Ti" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "GeForce RTX 2080 Ti" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from GeForce RTX 2080 Ti (GPU0) -> GeForce RTX 2080 Ti (GPU1) : Yes
> Peer access from GeForce RTX 2080 Ti (GPU1) -> GeForce RTX 2080 Ti (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> GeForce RTX 2080 Ti (GPU0) supports UVA: Yes
> GeForce RTX 2080 Ti (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 44.81GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed

This a 2 2080 Ti machine running Ubuntu 18.04.
Driver is 410.66. CUDA 10.0
Motherboard is ASUS x99-E WS/USB 3.1
CPU is Intel i7-6850k
I’ve not added/changed any IOMMU kernel parameters

Peering works as expected when the 2080 Tis are replaced with 1080 Tis, and no other changes.

Same here.

I believe we have to use nvlink to enable p2p on 2080ti…

Correct. Tested the same with 2x 2080ti and the NVlink bridge. In the CUDA samples, there’s a p2p bandwidth example on the 01… directory. It will show whether the links are capable and then show the bandwidth. (pretty impressive!)

I’m also trying to figure out how to lay hands upon a “2-slot” spacing RTX Nvlink bridge. They are only being made in 3-slot and 4-slot spacing.

Would like to use 4 cards in a X99 WS-E board, with links on paired cards. But that would require links that just don’t exist…

I suppose it is impossible to have 4 founders edition in X99 WS-E.

The biggest problem is that FE cards will be seriously over-heating for 2-slot spacing. That’s why Nvidia is only selling 3-slot and 4-slot bridge. I bought three FE cards for deep learning, and it turns out it is a terrible choice. I still think 1080Ti is better: p2p without bridge with good price.

If you really want 4 way 2080Ti, Asus Turbo maybe a good option.

BTW, I wonder if you can put 3 cards on X99 WS-E, with 3-slot spacing? Can the pci-e lanes be working on PCI-E x8 Mode in this case?

I mean to use these three PCI-E lanes: 1st, 4th, 7th.

The official manual suggests to use 1st, 3rd, 5th and 7th, so I wonder if 4th lane is capable for 16/8X mode

Hi,

So this means that you cant have P2P between more then 2 2080Tis ? is this the official word from Nvidia ? No P2P support for 2080tis except when using an nvlink bridge ?

This thread may be of interest:

[url]https://devtalk.nvidia.com/default/topic/1046951/cuda-programming-and-performance/does-titan-rtx-support-p2p-access-w-o-nvlink-/[/url]