[P2P] Device couldn't access another one with p2pBandwidthLatencyTest

Dear All,

Environment :
OS : Ubuntu-22.04
Kernel Version : 5.15.0-40-generic
CUDA Version : 11.7
nVIDIA Driver : 515

We’ve a product that consist of a Intel Ice Lake Processor that include two x16 PCIE slot which connect RTX A5000.

Device Info with lspci

# lspci -tvv
...
 +-[0000:c2]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[c3]--+-00.0  NVIDIA Corporation GA102GL [RTX A5000]
 |                        \-00.1  NVIDIA Corporation GA102 High Definition Audio Controller
 +-[0000:89]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[8a]--+-00.0  NVIDIA Corporation GA102GL [RTX A5000]
 |                        \-00.1  NVIDIA Corporation GA102 High Definition Audio Controller
# lspci -nn
...
89:00.0 System peripheral [0880]: Intel Corporation Device [8086:09a2] (rev 04)
89:00.1 System peripheral [0880]: Intel Corporation Device [8086:09a4] (rev 04)
89:00.2 System peripheral [0880]: Intel Corporation Device [8086:09a3] (rev 04)
89:00.4 Host bridge [0600]: Intel Corporation Device [8086:0998]
89:02.0 PCI bridge [0604]: Intel Corporation Device [8086:347a] (rev 04)
8a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A5000] [10de:2231] (rev a1)
8a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
c2:00.0 System peripheral [0880]: Intel Corporation Device [8086:09a2] (rev 04)
c2:00.1 System peripheral [0880]: Intel Corporation Device [8086:09a4] (rev 04)
c2:00.2 System peripheral [0880]: Intel Corporation Device [8086:09a3] (rev 04)
c2:00.4 Host bridge [0600]: Intel Corporation Device [8086:0998]
c2:02.0 PCI bridge [0604]: Intel Corporation Device [8086:347a] (rev 04)
c3:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A5000] [10de:2231] (rev a1)
c3:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
# lspci -s c2:02.0 -nnvvv | grep Lnk
                LnkCap: Port #17, Speed 16GT/s, Width x16, ASPM not supported
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s c3:00.0 -nnvvv | grep Lnk
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s 89:02.0 -nnvvv | grep Lnk
                LnkCap: Port #13, Speed 16GT/s, Width x16, ASPM not supported
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s 8a:00.0 -nnvvv | grep Lnk
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-

We’ve tried to use the p2pBandwidthLatencyTest for P2P Validation. But it always shows Deivce couldn’t access another one.

# nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      SYS     0-31            N/A
GPU1    SYS      X      0-31            N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
# nvidia-smi topo -p2p rw
        GPU0    GPU1
 GPU0   X       CNS
 GPU1   CNS     X

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown
Device: 0, NVIDIA RTX A5000, pciBusID: 8a, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA RTX A5000, pciBusID: c3, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0

It only could access another when I add the “ForceP2P=0x11” to the /etc/modprobe.d/nvidia-graphics-drivers-kms.conf

# nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      SYS     0-31            N/A
GPU1    SYS      X      0-31            N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
# nvidia-smi topo -p2p rw
        GPU0    GPU1
 GPU0   X       OK
 GPU1   OK      X

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown
# ./p2pBandwidthLatencyTest
...
Device: 0, NVIDIA RTX A5000, pciBusID: 8a, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA RTX A5000, pciBusID: c3, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

Even ForceP2P seems working, but there still some questions which made us curious

  1. About the “CNS = Chipset not supported”
    Is that just indicates our Intel Processor doesn’t support P2P?
    Or any info further for it?

  2. If that just meaning our processor not suitable for P2P.
    Would you mind to share us what kind feature the processor need for P2P?

  3. About the “SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)”
    If my understanding is correct, there’re two processor socket in our product that connect with UPI actually. BUT we only validate P2P with single processor so far, the another socket is empty.
    Why it is still get SYS? Is there any hint further?
    Is that also impact the P2P validate?

Best Regards,
MOMO Chen

NVIDIA doesn’t support qualification/verification/validation of platforms via these forums, so anything I say here is for general information. It should not be construed as any sort of approval of what you are doing.

P2P communications have historically “not worked” over QPI. It seems evident that you have one GPU plugged into a PCIE socket attached to one CPU socket, and another GPU that is plugged into a PCIE socket that is attached to the other CPU socket. That is what the tool is saying, anyway (SYS). If that is not the case, then nothing is working correctly and I refer you to the first sentence in my response here. In later Intel systems, there are some cases where P2P can be supported across UPI.

Since you are getting a “CNS not supported” message, its entirely possible that the tool is unreliable. I refer you to my first sentence. “CNS” means that the driver doesn’t recognize your chipset (the core logic integrated into the CPU) and so P2P generally wouldn’t be enabled/won’t work.

There is no public list of supported processors for P2P.

All of these inquiries should be directed back to the provider (OEM) of the platform.

Dear Rober,

Thanks for your shared, We will keep to trace with provider.

Best Regards,
MOMO Chen

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.