We replaced an existing CX5 card with a new CX6 card. The CX5 card was connexted with dual 100G SFP. After the swap, we could not longer connext at 100G, but swapping in 40G worked. We see that the card is connecting at Gen 3 and not Gen 4 and therefore isn’t providing enough bandwidth.
Some info below:
mlxlink
PCIe Operational (Enabled) Info
-------------------------------
Depth, pcie index, node : 0, 0, 0
Link Speed Active (Enabled) : 8G-Gen 3 (16G-Gen 4)
Link Width Active (Enabled) : 16X (16X)
EYE Opening Info (PCIe)
-----------------------
Physical Grade : 1888, 1426, 1800, 1740, 1833, 1566, 2016, 1767, 1711, 1740, 1800, 2442, 1664, 1458, 1624, 1736
Height Eye Opening [mV] : 151, 114, 144, 139, 146, 125, 161, 141, 136, 139, 144, 195, 133, 116, 129, 138
Phase Eye Opening [psec] : 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
Management PCIe Performance Counters Info
-----------------------------------------
RX Errors : 0
TX Errors : 19
CRC Error dllp : 0
CRC Error tlp : 0
Effective ber : 15E-255
dmesg/kernel
[ 3.675704] mlx_compat: loading out-of-tree module taints kernel.
[ 3.675815] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
[ 3.708709] mlx5_core 0000:5e:00.0: firmware version: 22.32.2004
[ 3.708742] mlx5_core 0000:5e:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 4.005266] mlx5_core 0000:5e:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 4.005697] mlx5_core 0000:5e:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 4.012100] mlx5_core 0000:5e:00.0: Port module event: module 0, Cable plugged
[ 4.012465] mlx5_core 0000:5e:00.0: mlx5_pcie_event:299:(pid 1010): Detected insufficient power on the PCIe slot (27W).
[ 4.046062] mlx5_core 0000:5e:00.0: mlx5_fw_tracer_start:830:(pid 932): FWTracer: Ownership granted and active
[ 4.052631] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 4.222914] mlx5_core 0000:5e:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.244545] mlx5_core 0000:5e:00.1: firmware version: 22.32.2004
[ 4.244602] mlx5_core 0000:5e:00.1: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 4.559094] mlx5_core 0000:5e:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 4.559565] mlx5_core 0000:5e:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 4.566178] mlx5_core 0000:5e:00.1: Port module event: module 1, Cable plugged
[ 4.566616] mlx5_core 0000:5e:00.1: mlx5_pcie_event:299:(pid 9): Detected insufficient power on the PCIe slot (27W).
[ 4.608098] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 4.795364] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.820152] mlx5_core 0000:5e:00.0 enp94s0f0np0: renamed from eth1
[ 4.863814] mlx5_core 0000:5e:00.1 enp94s0f1np1: renamed from eth0
[ 12.508506] mlx5_core 0000:5e:00.0 enp94s0f0np0: Link up
[ 13.061623] mlx5_core 0000:5e:00.1 enp94s0f1np1: Link up
[ 13.480517] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 14.307750] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
lspci
5e:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]
Subsystem: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:0016]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 35
NUMA node: 0
IOMMU group: 86
Region 0: Memory at c2000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at c5e00000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-