Our server contains 8 RTX 3090s. They are connected in pairs via NVLink, and every 4 GPUs are connected through a PCIe switch.Here is the result of “nvidia-smi topo -m”:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV4 PIX PIX SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU1 NV4 X PIX PIX SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU2 PIX PIX X NV4 SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU3 PIX PIX NV4 X SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU4 SYS SYS SYS SYS X NV4 PIX PIX 16-31,48-63 1 N/A
GPU5 SYS SYS SYS SYS NV4 X PIX PIX 16-31,48-63 1 N/A
GPU6 SYS SYS SYS SYS PIX PIX X NV4 16-31,48-63 1 N/A
GPU7 SYS SYS SYS SYS PIX PIX NV4 X 16-31,48-63 1 N/A
GPUs 0-3 and GPUs 4-7 are interconnected via a PCIe switch.
Result of “nvidia-smi topo -p2p w”
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 X OK CNS CNS CNS CNS CNS CNS
GPU1 OK X CNS CNS CNS CNS CNS CNS
GPU2 CNS CNS X OK CNS CNS CNS CNS
GPU3 CNS CNS OK X CNS CNS CNS CNS
GPU4 CNS CNS CNS CNS X OK CNS CNS
GPU5 CNS CNS CNS CNS OK X CNS CNS
GPU6 CNS CNS CNS CNS CNS CNS X OK
GPU7 CNS CNS CNS CNS CNS CNS OK X
It seems that it’s “chipset not supported”, P2P access is not possible between the 4 GPUs that are connected by the PCIe switch. I tried to modify the ACS control bits on the PCIe switch, but it didn’t work. So, does the RTX 3090 support P2P access over a PCIe switch?