The SYS legend given by "nvidia-smi topo -m" conflicts with the "NUMA Affinity" field

I have a machine with four V100-PCIe GPUs, and I want to understand the specific PCIe connectivity topology between them. The results obtained using nvidia-smi topo -m are as follows:

        GPU0    GPU1    GPU2    GPU3    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     SYS     SYS     PIX     PIX     0-9,20-29       0               N/A
GPU1    PIX      X      SYS     SYS     PIX     PIX     0-9,20-29       0               N/A
GPU2    SYS     SYS      X      PIX     SYS     SYS     0-9,20-29       0               N/A
GPU3    SYS     SYS     PIX      X      SYS     SYS     0-9,20-29       0               N/A
NIC0    PIX     PIX     SYS     SYS      X      PIX
NIC1    PIX     PIX     SYS     SYS     PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

The SYS in above result indicates that GPU0 and GPU1 are in the same NUMA node, while GPU1 and GPU2 are in another NUMA node. But the NUMA Affinity field indicates all GPUs all in NUMA node 0.

The results obtained through lspci -tv are as follows (I only kept the relevant part related to GPUs):"

 +-[0000:3a]-+-00.0-[3b-41]----00.0-[3c-41]--+-04.0-[3d]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-08.0-[3e]--
 |           |                               +-0c.0-[3f]--
 |           |                               +-10.0-[40]----00.0  Xilinx Corporation Device d004
 |           |                               \-14.0-[41]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
....
 +-[0000:17]-+-00.0-[18-1e]----00.0-[19-1e]--+-04.0-[1a]--+-00.0  Mellanox Technologies MT27800 Family [ConnectX-5]
 |           |                               |            \-00.1  Mellanox Technologies MT27800 Family [ConnectX-5]
 |           |                               +-08.0-[1b]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-0c.0-[1c]--
 |           |                               +-10.0-[1d]--
 |           |                               \-14.0-[1e]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]

This indicates that two GPUs are under bus3a, while the other two are under bus17.

Then I used dmesg | grep NUMA to view the information about NUMA nodes, and the result is as follows:

[    2.496394] pci_bus 0000:00: on NUMA node 0
[    2.527717] pci_bus 0000:17: on NUMA node 0
[    2.547933] pci_bus 0000:3a: on NUMA node 0
[    2.554105] pci_bus 0000:5d: on NUMA node 0
[    2.558604] pci_bus 0000:80: on NUMA node 1
[    2.567157] pci_bus 0000:85: on NUMA node 1
[    2.574463] pci_bus 0000:ae: on NUMA node 1
[    2.579994] pci_bus 0000:d7: on NUMA node 1

This result indicates that bus3a and bus17 are in the same NUMA node, meaning that all four GPUs are in the same node. This contradicts the SYS obtained from nvidia-smi topo -m but agree with “NUMA Affinity” field.

I think the SYS in nvidia-smi result should be NODE. Is my understanding of the SYS legend incorrect?

Hi Xiangzhouliiu,
About your question, please check the information as follows:
If your Server host is the OEM Server, the rusult might be like the result above:

  1. 4 PCIe GPUs modules (GPU0-GPU3) are the same “NUMA Affinity” of 0
  2. GPU0 and GPU1 are on the same PCIe switch; GPU2 and GPU3 are on the another PCIe switch
  3. these two PCIe switch are communicated via the “Root Complex” under the same CPU

Then by checking the “nvidia-smi -m”, the GPU0 and GPU2 (which are on the different PCIe switch) are the “SYS”.

Best wishes
Bo Jing