Using multiple RTX 2080 Ti cards in parallel not possible?

alexey.kostin · May 9, 2019, 3:57pm

I am trying to run DIGITS (which is at the end caffe) on machine with 8x RTX 2080Ti cards. However the speed is much slower than on machine with 8x GTX 1080Ti cards.

After a bit if digging I can see that topology looks good

# nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity
GPU0     X      PIX     PIX     PIX     SYS     SYS     SYS     SYS     0-19,40-59
GPU1    PIX      X      PIX     PIX     SYS     SYS     SYS     SYS     0-19,40-59
GPU2    PIX     PIX      X      PIX     SYS     SYS     SYS     SYS     0-19,40-59
GPU3    PIX     PIX     PIX      X      SYS     SYS     SYS     SYS     0-19,40-59
GPU4    SYS     SYS     SYS     SYS      X      PIX     PIX     PIX     20-39,60-79
GPU5    SYS     SYS     SYS     SYS     PIX      X      PIX     PIX     20-39,60-79
GPU6    SYS     SYS     SYS     SYS     PIX     PIX      X      PIX     20-39,60-79
GPU7    SYS     SYS     SYS     SYS     PIX     PIX     PIX      X      20-39,60-79

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

However, there is no peer to peer access between any of cards. I used deviceQuery tool from CUDA samples, which is calling cudaDeviceCanAccessPeer(&can_access_peer, gpuid[i], gpuid[j]).

There is also a thread https://devtalk.nvidia.com/default/topic/1043300/linux/2080-tis-cudadevicecanaccesspeer-failure-without-nvlink-bridge/ which suggests that P2P access for RTX 2080Ti cards can only be done via NVLink bridge, but officially it is not confirmed.

I can try buying NVLink bridge, but it can only connect 2 cards.

Can anyone point me to official NVidia position regarding P2P access between RTX 2080Ti cards via PCIe bus? P2P over PCIe works fine for GTX 1080Ti cards in my other machine.

Robert_Crovella · May 9, 2019, 4:30pm

This thread may be of interest:

[url]https://devtalk.nvidia.com/default/topic/1046951/cuda-programming-and-performance/does-titan-rtx-support-p2p-access-w-o-nvlink-/[/url]

alexey.kostin · May 9, 2019, 5:06pm

Hi Robert,

Thanks for prompt response. I have the same case as in the thread you suggested.

nvidia-smi topo -p2p r
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7
 GPU0   X       CNS     CNS     CNS     CNS     CNS     CNS     CNS
 GPU1   CNS     X       CNS     CNS     CNS     CNS     CNS     CNS
 GPU2   CNS     CNS     X       CNS     CNS     CNS     CNS     CNS
 GPU3   CNS     CNS     CNS     X       CNS     CNS     CNS     CNS
 GPU4   CNS     CNS     CNS     CNS     X       CNS     CNS     CNS
 GPU5   CNS     CNS     CNS     CNS     CNS     X       CNS     CNS
 GPU6   CNS     CNS     CNS     CNS     CNS     CNS     X       CNS
 GPU7   CNS     CNS     CNS     CNS     CNS     CNS     CNS     X

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown

It is quite upsetting to realise that we invested several thousand euros in the machine I can not use.

Is there a list of motherboards or chipsets which support P2P over PCIe?
Or P2P over PCIe for RTX 2080Ti is not supported at all?
Or maybe is possible to change something in the kernel to enable the support?

Robert_Crovella · May 9, 2019, 5:14pm

Not that I know of. Furthermore this particular issue is not a motherboard or chipset issue. Please re-read the thread I linked.

alexey.kostin · May 9, 2019, 5:47pm

Robert, sorry to be a bit pedantic here. Your post in another thread states that RTX 2080Ti can only do P2P over NVLink bridge. But this implies that you can only run 2 cards in parallel because the bridge can only connect 2 cards. Is it the case? This seems to be a massive step back from what you could do with GTX 1080Ti.

Robert_Crovella · May 9, 2019, 6:20pm

For any 2 GPUs in view here (Titan RTX, RTX 2080Ti, RTX2080) that you wish to place into a P2P relationship, those 2 GPUs must have a NVLink bridge installed between them. You cannot rely on PCIE to establish the peer relationship.

I agree that it is not possible to place more than 2 GPUs in the same P2P clique with this arrangement. (Assuming the products in view here, and assuming no changes to NVLink bridge design.) I believe it should be possible to have up to four 2-way cliques, amongst 8 GPUs, with such an arrangement, assuming you add 4 bridges pairwise amongst the GPUs. That is not the same as having all 8 GPUs participate in the same clique, however. And I have not personally tested that myself.

I agree that this is substantially different than GTX 1080Ti behavior.

Robert_Crovella · May 11, 2019, 2:19am

Please don’t assume that just because I said a pairwise P2P arrangement might be possible that it means that I think it will provide any tangible performance benefit to your DIGITS/Caffe test case.

To be clear, I don’t think it will provide any tangible performance benefits there. You’re welcome to do as you wish of course.

alexey.kostin · May 13, 2019, 3:23pm

I have done further investigations with caffe framework by looking at the source code and doing some experiments (I presume tensorflow will be same). The way caffe uses multiple GPU cards is by spreading batch between GPUs. That is if you have batch of 64 samples which you want to process in parallel on 4 cards than each card will be processing forward propagation stage and most of backward propagation for batch of 16 . Only tiny amount of computation (with very small amount of data) is done between cards to “merge” gradients (call to ncclAllReduce). This is where benefit of fast P2P data exchange could be theoretically noticed. So even theoretically benefit of fast P2P looks negligible for caffe framework.

What I noticed as well is, in fact, I get about 10% performance increase if I split work between GPUs which sit on different PCIe switches. I presume the possible explanation is that data exchange between CPU and GPUs is done faster when data goes via 2 PCIe switches. So the gain from faster CPU-GPU data exchange could be more than loss from slow between GPU data speed. I will test this as soon as the other machine with 1080Ti cards and PCIe P2P working will be free.

I have not done tests with NVLink bridge yet. If we finally get it/them I will post my findings.

Topic		Replies	Views
Does Titan RTX support P2P access w/o NVLink? CUDA Programming and Performance	9	3843	December 15, 2019
2080 Tis cudaDeviceCanAccessPeer failure without NVLink bridge Linux	6	2675	May 9, 2019
PCIE peer-to-peer capabilities of RTX Ampere "GeForce" 3000 cards CUDA Setup and Installation	0	1001	September 4, 2020
peer2peer with 2 nvidia cards GeForce GTX 1080 Ti and TITAN X CUDA Programming and Performance	1	1535	September 12, 2017
Compatibility of NVLink bridges OptiX	4	3531	June 14, 2022
How to enable P2P access? CUDA Setup and Installation cuda	3	4658	February 6, 2023
How can I tell which NVIDIA GPUs will have P2P access to the same GPU on PCIe? CUDA Programming and Performance	6	8092	January 20, 2025
Why 2RTX 2080ti run slower than 2Tesla P100？ CUDA Programming and Performance	17	5353	July 6, 2019
No P2P - Dual NVlinked RTX 2080 TI setup on HP workstation - NVLink not working, no SLI option in control panel CUDA Setup and Installation	6	2276	December 26, 2019
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2799	April 11, 2023

Using multiple RTX 2080 Ti cards in parallel not possible?

Related topics