Dear All,
Environment :
OS : Ubuntu-22.04
Kernel Version : 5.15.0-40-generic
CUDA Version : 11.7
nVIDIA Driver : 515
We’ve a product that consist of a Intel Ice Lake Processor that include two x16 PCIE slot which connect RTX A5000.
Device Info with lspci
# lspci -tvv
...
+-[0000:c2]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0998
| \-02.0-[c3]--+-00.0 NVIDIA Corporation GA102GL [RTX A5000]
| \-00.1 NVIDIA Corporation GA102 High Definition Audio Controller
+-[0000:89]-+-00.0 Intel Corporation Device 09a2
| +-00.1 Intel Corporation Device 09a4
| +-00.2 Intel Corporation Device 09a3
| +-00.4 Intel Corporation Device 0998
| \-02.0-[8a]--+-00.0 NVIDIA Corporation GA102GL [RTX A5000]
| \-00.1 NVIDIA Corporation GA102 High Definition Audio Controller
# lspci -nn
...
89:00.0 System peripheral [0880]: Intel Corporation Device [8086:09a2] (rev 04)
89:00.1 System peripheral [0880]: Intel Corporation Device [8086:09a4] (rev 04)
89:00.2 System peripheral [0880]: Intel Corporation Device [8086:09a3] (rev 04)
89:00.4 Host bridge [0600]: Intel Corporation Device [8086:0998]
89:02.0 PCI bridge [0604]: Intel Corporation Device [8086:347a] (rev 04)
8a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A5000] [10de:2231] (rev a1)
8a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
c2:00.0 System peripheral [0880]: Intel Corporation Device [8086:09a2] (rev 04)
c2:00.1 System peripheral [0880]: Intel Corporation Device [8086:09a4] (rev 04)
c2:00.2 System peripheral [0880]: Intel Corporation Device [8086:09a3] (rev 04)
c2:00.4 Host bridge [0600]: Intel Corporation Device [8086:0998]
c2:02.0 PCI bridge [0604]: Intel Corporation Device [8086:347a] (rev 04)
c3:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A5000] [10de:2231] (rev a1)
c3:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
# lspci -s c2:02.0 -nnvvv | grep Lnk
LnkCap: Port #17, Speed 16GT/s, Width x16, ASPM not supported
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s c3:00.0 -nnvvv | grep Lnk
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s 89:02.0 -nnvvv | grep Lnk
LnkCap: Port #13, Speed 16GT/s, Width x16, ASPM not supported
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
# lspci -s 8a:00.0 -nnvvv | grep Lnk
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
We’ve tried to use the p2pBandwidthLatencyTest for P2P Validation. But it always shows Deivce couldn’t access another one.
# nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X SYS 0-31 N/A
GPU1 SYS X 0-31 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
# nvidia-smi topo -p2p rw
GPU0 GPU1
GPU0 X CNS
GPU1 CNS X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
Device: 0, NVIDIA RTX A5000, pciBusID: 8a, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA RTX A5000, pciBusID: c3, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0
It only could access another when I add the “ForceP2P=0x11” to the /etc/modprobe.d/nvidia-graphics-drivers-kms.conf
# nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X SYS 0-31 N/A
GPU1 SYS X 0-31 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
# nvidia-smi topo -p2p rw
GPU0 GPU1
GPU0 X OK
GPU1 OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
# ./p2pBandwidthLatencyTest
...
Device: 0, NVIDIA RTX A5000, pciBusID: 8a, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA RTX A5000, pciBusID: c3, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
Even ForceP2P seems working, but there still some questions which made us curious
-
About the “CNS = Chipset not supported”
Is that just indicates our Intel Processor doesn’t support P2P?
Or any info further for it? -
If that just meaning our processor not suitable for P2P.
Would you mind to share us what kind feature the processor need for P2P? -
About the “SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)”
If my understanding is correct, there’re two processor socket in our product that connect with UPI actually. BUT we only validate P2P with single processor so far, the another socket is empty.
Why it is still get SYS? Is there any hint further?
Is that also impact the P2P validate?
Best Regards,
MOMO Chen