CX6 connecting at PCI Gen3 when Gen4 is available

We replaced an existing CX5 card with a new CX6 card. The CX5 card was connexted with dual 100G SFP. After the swap, we could not longer connext at 100G, but swapping in 40G worked. We see that the card is connecting at Gen 3 and not Gen 4 and therefore isn’t providing enough bandwidth.

Some info below:

mlxlink

PCIe Operational (Enabled) Info
-------------------------------
Depth, pcie index, node         : 0, 0, 0
Link Speed Active (Enabled)     : 8G-Gen 3 (16G-Gen 4)
Link Width Active (Enabled)     : 16X (16X)

EYE Opening Info (PCIe)
-----------------------
Physical Grade                  :   1888,  1426,  1800,  1740,  1833,  1566,  2016,  1767,  1711,  1740,  1800,  2442,  1664,  1458,  1624,  1736
Height Eye Opening [mV]         :    151,   114,   144,   139,   146,   125,   161,   141,   136,   139,   144,   195,   133,   116,   129,   138
Phase  Eye Opening [psec]       :     16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16

Management PCIe Performance Counters Info
-----------------------------------------
RX Errors                       : 0
TX Errors                       : 19
CRC Error dllp                  : 0
CRC Error tlp                   : 0
Effective ber                   : 15E-255

dmesg/kernel

[    3.675704] mlx_compat: loading out-of-tree module taints kernel.
[    3.675815] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
[    3.708709] mlx5_core 0000:5e:00.0: firmware version: 22.32.2004
[    3.708742] mlx5_core 0000:5e:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    4.005266] mlx5_core 0000:5e:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.005697] mlx5_core 0000:5e:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    4.012100] mlx5_core 0000:5e:00.0: Port module event: module 0, Cable plugged
[    4.012465] mlx5_core 0000:5e:00.0: mlx5_pcie_event:299:(pid 1010): Detected insufficient power on the PCIe slot (27W).
[    4.046062] mlx5_core 0000:5e:00.0: mlx5_fw_tracer_start:830:(pid 932): FWTracer: Ownership granted and active
[    4.052631] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.222914] mlx5_core 0000:5e:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.244545] mlx5_core 0000:5e:00.1: firmware version: 22.32.2004
[    4.244602] mlx5_core 0000:5e:00.1: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    4.559094] mlx5_core 0000:5e:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.559565] mlx5_core 0000:5e:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    4.566178] mlx5_core 0000:5e:00.1: Port module event: module 1, Cable plugged
[    4.566616] mlx5_core 0000:5e:00.1: mlx5_pcie_event:299:(pid 9): Detected insufficient power on the PCIe slot (27W).
[    4.608098] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.795364] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.820152] mlx5_core 0000:5e:00.0 enp94s0f0np0: renamed from eth1
[    4.863814] mlx5_core 0000:5e:00.1 enp94s0f1np1: renamed from eth0
[   12.508506] mlx5_core 0000:5e:00.0 enp94s0f0np0: Link up
[   13.061623] mlx5_core 0000:5e:00.1 enp94s0f1np1: Link up
[   13.480517] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[   14.307750] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

lspci

5e:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]
	Subsystem: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:0016]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 35
	NUMA node: 0
	IOMMU group: 86
	Region 0: Memory at c2000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at c5e00000 [disabled] [size=1M]
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x16 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-

Validate that the slot where the CX6 has been installed is supporting Gen4 & has been set to Gen4

Validate that the latest GA BIOS version has been deployed

Validate on a different slot supporting Gen4 or different server/slot altogether

The HCA link capacity supports it: LnkCap: Port #0, Speed 16GT/s, Width x16, something is not set properly on the server or the slot does not have that capacity.

It is usually a server related issue

1 Like

Thank you. I do see that that server is a Gen-3 slot, and the previous CX5 was a Gen3 variant. How can I force the CX6 to allow dual 100G connections, even though the bandwidth won’t handle that, and manage myself, just like I was before with the CX5?