Hello,
i have a HPC Cluster with BeeGFS Storage mounted, when i test the write speed with dd on the BeeGFS mount from nodes which have a ConnectX5 card i can reach up to 6 GB/s.
When i do the same test on nodes with Connect6X Cards i only reach abou 2,7 GB/s.
I already read the Node Tuning Documentation but nothing of the recommended changes for tuning did help.
Setup:
Firmware all up to date
OS: Rocky Linux 8.10 on all nodes
Infiniband Switch : Mellanox QM8700
Fast Nodes 6GB/s with Connect5X Cards:
Node gpu0
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Gold 6438M
BIOS Model name: Intel(R) Xeon(R) Gold 6438M
Stepping: 8
CPU MHz: 3730.295
Node login (virtualized)
Vendor ID: AuthenticAMD
BIOS Vendor ID: QEMU
CPU family: 25
Model: 1
Model name: AMD EPYC 7453 28-Core Processor
BIOS Model name: pc-i440fx-8.1
Stepping: 1
CPU MHz: 2749.998
BogoMIPS: 5499.99
Virtualization: AMD-V
Slow Nodes 3GB/s Connect 6x cards
gpu1
Model name: AMD EPYC 7662 64-Core Processor
BIOS Model name: AMD EPYC 7662 64-Core Processor
Stepping: 0
CPU MHz: 2000.000
CPU max MHz: 2154.2959
CPU min MHz: 1500.0000
BogoMIPS: 3999.74
gpu2:
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
CPU family: 23
Model: 49
Model name: AMD EPYC 7662 64-Core Processor
BIOS Model name: AMD EPYC 7662 64-Core Processor
Stepping: 0
CPU MHz: 2000.000
CPU max MHz: 2154.2959
CPU min MHz: 1500.0000
Is there anything known specific maybe about this AMD EPYC Series and Connect6X Adapaters ?
But ib_write_bw get’s same speed results.
[root@gpu2 ~]# ib_write_bw 192.172.1.13 -p 1815
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0d QPN 0x0445 PSN 0xe330cc RKey 0x200c00 VAddr 0x007fbe59a6b000
remote address: LID 0x06 QPN 0x096e PSN 0x2b944a RKey 0x004d90 VAddr 0x007f0b27954000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MiB/sec] BW average[MiB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 2000.000000 != 3293.977000. CPU Frequency is not max.
65536 5000 11220.83 11220.21 0.179523
---------------------------------------------------------------------------------------
[root@gpu2 ~]# ib_write_bw 192.172.1.13 -p 1815ib_write_bw 192.172.1.13 -p 1815
[root@gpu2 ~]# ssh gpu0
Last login: Wed Nov 6 10:45:02 2024 from 192.168.1.21
[root@gpu0 ~]# ib_write_bw 192.172.1.13 -p 1815
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0a QPN 0x01a6 PSN 0xbcd64b RKey 0x21db00 VAddr 0x007f71c5bba000
remote address: LID 0x06 QPN 0x096f PSN 0xb70ee4 RKey 0x004d00 VAddr 0x007f5240565000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MiB/sec] BW average[MiB/sec] MsgRate[Mpps]
65536 5000 11508.22 11507.83 0.184125
---------------------------------------------------------------------------------------
lspci shows16x bandwith for PCIe
[root@gpu1 ~]# lspci -vv -s a1:00.0
a1:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies Device 0009
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 159
NUMA node: 1
IOMMU group: 112
Region 0: Memory at 6213e000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at b6400000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: ConnectX-6 VPI adapter card, HDR IB (200Gb/s) and 200GbE, dual-port QSFP56
Read-only fields:
[PN] Part number: MCX653106A-HDAT
[EC] Engineering changes: AH
[V2] Vendor specific: MCX653106A-HDAT
[SN] Serial number: MT2244T00FCU
[V3] Vendor specific: 1a200186c255ed118000b83fd2a6c50c
[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653106A
[V0] Vendor specific: PCIeGen4 x16
[VU] Vendor specific: MT2244T00FCUMLNXS0D0F0
[RV] Reserved: checksum good, 1 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 08, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [1c0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [320 v1] Lane Margining at the Receiver <?>
Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [420 v1] Data Link Feature <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
There is another post which is mentioning perfomance issues on Connect 6X Adapters:
But i dont know if its related to the old OS in that case …