I am trying to run DIGITS (which is at the end caffe) on machine with 8x RTX 2080Ti cards. However the speed is much slower than on machine with 8x GTX 1080Ti cards.
After a bit if digging I can see that topology looks good
# nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity GPU0 X PIX PIX PIX SYS SYS SYS SYS 0-19,40-59 GPU1 PIX X PIX PIX SYS SYS SYS SYS 0-19,40-59 GPU2 PIX PIX X PIX SYS SYS SYS SYS 0-19,40-59 GPU3 PIX PIX PIX X SYS SYS SYS SYS 0-19,40-59 GPU4 SYS SYS SYS SYS X PIX PIX PIX 20-39,60-79 GPU5 SYS SYS SYS SYS PIX X PIX PIX 20-39,60-79 GPU6 SYS SYS SYS SYS PIX PIX X PIX 20-39,60-79 GPU7 SYS SYS SYS SYS PIX PIX PIX X 20-39,60-79 Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge) PIX = Connection traversing a single PCIe switch NV# = Connection traversing a bonded set of # NVLinks
However, there is no peer to peer access between any of cards. I used deviceQuery tool from CUDA samples, which is calling cudaDeviceCanAccessPeer(&can_access_peer, gpuid[i], gpuid[j]).
There is also a thread https://devtalk.nvidia.com/default/topic/1043300/linux/2080-tis-cudadevicecanaccesspeer-failure-without-nvlink-bridge/ which suggests that P2P access for RTX 2080Ti cards can only be done via NVLink bridge, but officially it is not confirmed.
I can try buying NVLink bridge, but it can only connect 2 cards.
Can anyone point me to official NVidia position regarding P2P access between RTX 2080Ti cards via PCIe bus? P2P over PCIe works fine for GTX 1080Ti cards in my other machine.