CX6 connecting at PCI Gen3 when Gen4 is available

We replaced an existing CX5 card with a new CX6 card. The CX5 card was connexted with dual 100G SFP. After the swap, we could not longer connext at 100G, but swapping in 40G worked. We see that the card is connecting at Gen 3 and not Gen 4 and therefore isn’t providing enough bandwidth.

Some info below:

mlxlink

PCIe Operational (Enabled) Info
-------------------------------
Depth, pcie index, node         : 0, 0, 0
Link Speed Active (Enabled)     : 8G-Gen 3 (16G-Gen 4)
Link Width Active (Enabled)     : 16X (16X)

EYE Opening Info (PCIe)
-----------------------
Physical Grade                  :   1888,  1426,  1800,  1740,  1833,  1566,  2016,  1767,  1711,  1740,  1800,  2442,  1664,  1458,  1624,  1736
Height Eye Opening [mV]         :    151,   114,   144,   139,   146,   125,   161,   141,   136,   139,   144,   195,   133,   116,   129,   138
Phase  Eye Opening [psec]       :     16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16,    16

Management PCIe Performance Counters Info
-----------------------------------------
RX Errors                       : 0
TX Errors                       : 19
CRC Error dllp                  : 0
CRC Error tlp                   : 0
Effective ber                   : 15E-255

dmesg/kernel

[    3.675704] mlx_compat: loading out-of-tree module taints kernel.
[    3.675815] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
[    3.708709] mlx5_core 0000:5e:00.0: firmware version: 22.32.2004
[    3.708742] mlx5_core 0000:5e:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    4.005266] mlx5_core 0000:5e:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.005697] mlx5_core 0000:5e:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    4.012100] mlx5_core 0000:5e:00.0: Port module event: module 0, Cable plugged
[    4.012465] mlx5_core 0000:5e:00.0: mlx5_pcie_event:299:(pid 1010): Detected insufficient power on the PCIe slot (27W).
[    4.046062] mlx5_core 0000:5e:00.0: mlx5_fw_tracer_start:830:(pid 932): FWTracer: Ownership granted and active
[    4.052631] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.222914] mlx5_core 0000:5e:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.244545] mlx5_core 0000:5e:00.1: firmware version: 22.32.2004
[    4.244602] mlx5_core 0000:5e:00.1: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:5d:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    4.559094] mlx5_core 0000:5e:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.559565] mlx5_core 0000:5e:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[    4.566178] mlx5_core 0000:5e:00.1: Port module event: module 1, Cable plugged
[    4.566616] mlx5_core 0000:5e:00.1: mlx5_pcie_event:299:(pid 9): Detected insufficient power on the PCIe slot (27W).
[    4.608098] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    4.795364] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    4.820152] mlx5_core 0000:5e:00.0 enp94s0f0np0: renamed from eth1
[    4.863814] mlx5_core 0000:5e:00.1 enp94s0f1np1: renamed from eth0
[   12.508506] mlx5_core 0000:5e:00.0 enp94s0f0np0: Link up
[   13.061623] mlx5_core 0000:5e:00.1 enp94s0f1np1: Link up
[   13.480517] mlx5_core 0000:5e:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[   14.307750] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

lspci

5e:00.0 Ethernet controller [0200]: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:101d]
	Subsystem: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] [15b3:0016]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 35
	NUMA node: 0
	IOMMU group: 86
	Region 0: Memory at c2000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at c5e00000 [disabled] [size=1M]
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x16 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn+
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [48] Vital Product Data
		Product Name: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56, PCIe 4.0 x16, No Crypto                                                                                                      
		Read-only fields:
			[PN] Part number: MCX623106AN-CDAT         
			[EC] Engineering changes: AH
			[V2] Vendor specific: MCX623106AN-CDAT         
			[SN] Serial number: XXX   
			[V3] Vendor specific: XXX
			[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX623106A      
			[V0] Vendor specific: PCIeGen4 x16 
			[VU] Vendor specific: XXX 
			[RV] Reserved: checksum good, 1 byte(s) reserved
		End
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 04, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 2, stride: 1, Device ID: 101e
		Supported Page Size: 000007ff, System Page Size: 00000001
		Region 0: Memory at 00000000c4800000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Capabilities: [1c0 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [230 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [320 v1] Lane Margining at the Receiver <?>
	Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [420 v1] Data Link Feature <?>
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core