No link detected with ConnectX-4 Lx

Hi there!

I recently bought my first fiber network card, the Mellanox ConnextX-4 Lx EN for my PC and two SFP28 modules from fs.com to connect with my Mikrotik CCR2004 router. Not sure if it matters but the cable I used is a 40m Lightwin Singlemode OS2 Duplex Patch Cable 9/125μm with LC-LC plugs. After I plugged everything in, the network interface immediately appeared… but tells me that the “Cable is unplugged”.

First i thought it is a cable issue so I double-checked that everything is correctly plugged in (the transceivers, the network card, the cables) and also swapped the cable because I had a second one which is working (and used to connect a Ubiquiti Aggregation Switch).

My next thought was that maybe one of the SFP28 modules is broken but the Mikrotik Router detects both of them when plugged in. I can also see their presence on my PC when using ethtool or mlxlink:

$ mlxlink -d 26:00.0 -m -e -c
Operational Info
----------------
State                              : Polling 
Physical state                     : ETH_AN_FSM_ABILITY_DETECT 
Speed                              : N/A 
Width                              : N/A 
FEC                                : N/A 
Loopback Mode                      : No Loopback 
Auto Negotiation                   : ON 

Supported Info
--------------
Enabled Link Speed                 : 0x38007013 (25G,10G,1G) 
Supported Cable Speed              : 0x20000000 (25G) 

Troubleshooting Info
--------------------
Status Opcode                      : 36 
Group Opcode                       : PHY FW 
Recommendation                     : Other issues 

Tool Information
----------------
Firmware Version                   : 14.32.1900 
MFT Version                        : mft 4.30.0-139 

Module Info
-----------
Temperature [C]                    : 45 [-50..95]
Voltage [mV]                       : 3262.8 [2970..3630]
Bias Current [mA]                  : 40.756 [10..100]
Rx Power Current [dBm]             : -2 [-16..6]
Tx Power Current [dBm]             : -2 [-10..6]
Identifier                         : SFP28/SFP+
Compliance                         : 100GBASE-LR4 or 25GBASE-LR
Cable Technology                   : N/A
Cable Type                         : Optical Module (separated)
OUI                                : Other
Vendor Name                        : FS
Vendor Part Number                 : SFP28-25GLR-31
Vendor Serial Number               : G2220463087
Rev                                : A2
Wavelength [nm]                    : 1310
Transfer Distance [m]              : 0
Attenuation (5g,7g,12g)[dB]        : N/A
FW Version                         : N/A
Digital Diagnostic Monitoring      : Yes
Power Class                        : 1.0 W max
CDR RX                             : ON
CDR TX                             : ON
LOS Alarm                          : N/A
SNR Media Lanes [dB]               : N/A
SNR Host Lanes [dB]                : N/A
IB Cable Width                     : N/A
Memory Map Revision                : 0
Linear Direct Drive                : 0
Cable Breakout                     : N/A
SMF Length                         : N/A
MAX Power                          : 0
Cable Rx AMP                       : 0
Cable Rx Emphasis                  : 16
Cable Rx Post Emphasis             : 0
Cable Tx Equalization              : 32
Wavelength Tolerance               : 0.0nm
Module State                       : N/A
DataPath state [per lane]          : N/A
Rx Output Valid [per lane]         : 0
Nominal bit rate                   : 25.750Gb/s
Rx Power Type                      : Average power
Manufacturing Date                 : 15_06_22
Active Set Host Compliance Code    : N/A
Active Set Media Compliance Code   : N/A
Error Code Response                : N/A
Module FW Fault                    : N/A
DataPath FW Fault                  : N/A
Tx Fault [per lane]                : 0
Tx LOS [per lane]                  : N/A
Tx CDR LOL [per lane]              : 0
Rx LOS [per lane]                  : 0
Rx CDR LOL [per lane]              : 0
Tx Adaptive EQ Fault [per lane]    : N/A

EYE Opening Info
----------------
Physical Grade                     :      0
Height Eye Opening [mV]            :      0
Phase  Eye Opening [psec]          :      0

Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min]        : N/A
Effective Physical Errors          : N/A
Raw Physical Errors Per Lane       : N/A
Effective Physical BER             : N/A
Raw Physical BER                   : N/A
Link Down Counter                  : N/A
Link Error Recovery Counter        : N/A

Interestingly the troubleshooting info shows that something seems to be wrong but I couldn’t find a lot of information about what that means:

Status Opcode : 36
Group Opcode : PHY FW
Recommendation : Other issues

My next idea was that maybe something is wrong with the firmware so I tried to reset the config and firmware:

$ mlxconfig -d 26:00.0 reset

 Reset configuration for device 26:00.0? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

$ mlxfwreset -d 26:00.0 reset

The reset level for device, /dev/mst/mt4117_pciconf0 is:

3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw             -Done
-I- Stopping Driver                         -Done
-I- Resetting PCI                           -Done
-I- Starting Driver                         -Done
-I- Restarting MST                          -Done
-I- FW was loaded successfully.

After restarting my machine, still no connection. I then checked and updated the firmware using mlxfwmanager:

$ mlxfwmanager -d /dev/mst/mt4117_pciconf0 -u -i ./fw-ConnectX4Lx-rel-14_32_1900-MCX4121A-ACA_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin

$ mlxfwmanager -d /dev/mst/mt4117_pciconf0
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX4LX
  Part Number:      MCX4121A-ACA_Ax
  Description:      ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
  PSID:             MT_2420110034
  PCI Device Name:  /dev/mst/mt4117_pciconf0
  Base MAC:         98039b7930c1
  Versions:         Current        Available     
     FW             14.32.1900     14.32.1900    
     PXE            3.6.0502       3.6.0502      
     UEFI           14.25.0017     14.25.0017    

  Status:           Up to date

And verified the image with flint:

$ flint -d /dev/mst/mt4117_pciconf0 v

FS3 failsafe image

     /0x00800038-0x00801e8b (0x001e54)/ (BOOT2) - OK
     /0x00802000-0x0080201f (0x000020)/ (ITOC_HEADER) - OK
     /0x00804000-0x0081472b (0x01072c)/ (IRON_PREP_CODE) - OK
     /0x00815000-0x008150ff (0x000100)/ (RESET_INFO) - OK
     /0x00816000-0x00816bff (0x000c00)/ (FW_MAIN_CFG) - OK
     /0x00817000-0x008174bf (0x0004c0)/ (FW_BOOT_CFG) - OK
     /0x00818000-0x008195ff (0x001600)/ (HW_MAIN_CFG) - OK
     /0x0081a000-0x0081a13f (0x000140)/ (HW_BOOT_CFG) - OK
     /0x0081b000-0x0081dc7f (0x002c80)/ (PHY_UC_CONSTS) - OK
     /0x0081e000-0x0081e13f (0x000140)/ (IMAGE_SIGNATURE_256) - OK
     /0x0081f000-0x0081f8ff (0x000900)/ (PUBLIC_KEYS_2048) - OK
     /0x00820000-0x0082008f (0x000090)/ (FORBIDDEN_VERSIONS) - OK
     /0x00821000-0x0082123f (0x000240)/ (IMAGE_SIGNATURE_512) - OK
     /0x00822000-0x008230ff (0x001100)/ (PUBLIC_KEYS_4096) - OK
     /0x00824000-0x00873fff (0x050000)/ (PROGRAMMABLE_HW_FW) - OK
     /0x00874000-0x00925e27 (0x0b1e28)/ (ROM_CODE) - OK
     /0x00926000-0x00935fff (0x010000)/ (CRDUMP_MASK_DATA) - OK
     /0x00936000-0x009369ff (0x000a00)/ (PCIE_PHY_UC_CODE) - OK
     /0x00937000-0x0094091f (0x009920)/ (PHY_UC_CODE) - OK
     /0x00941000-0x0096c457 (0x02b458)/ (PCI_CODE) - OK
     /0x0096d000-0x00cbb8bf (0x34e8c0)/ (MAIN_CODE) - OK
     /0x00cbc000-0x00cc9cbf (0x00dcc0)/ (PCIE_LINK_CODE) - OK
     /0x00cca000-0x00ccae3f (0x000e40)/ (POST_IRON_BOOT_CODE) - OK
     /0x00ccb000-0x00ccce0f (0x001e10)/ (UPGRADE_CODE) - OK
     /0x00ccd000-0x00ccd3ff (0x000400)/ (IMAGE_INFO) - OK
     /0x00ccd400-0x00ccdbcb (0x0007cc)/ (DBG_FW_INI) - OK
     /0x00ccdbcc-0x00ccdbd3 (0x000008)/ (DBG_FW_PARAMS) - OK
     /0x00fa0000-0x00faffff (0x010000)/ (NV_DATA) - CRC IGNORED
     /0x00fb0000-0x00fbffff (0x010000)/ (NV_DATA) - CRC IGNORED
     /0x00fc0000-0x00fcffff (0x010000)/ (FW_NV_LOG) - CRC IGNORED
     /0x00fee000-0x00fee1ff (0x000200)/ (DEV_INFO) - OK
     /0x00ff8000-0x00ff813f (0x000140)/ (MFG_INFO) - OK
     /0x00ff8140-0x00ff81b7 (0x000078)/ (VPD_R0) - OK

-I- FW image verification succeeded. Image is bootable.

Nothing changed. Other people mentioned that playing with the autonegotiation and FEC settings might help, so disabled autonegotiation on my Mikrotik router and played with the FEC settings. I also tried to enforce those settings on the Mellanox card with ethtool:

$ ethtool -s enp38s0f0np0 speed 25000 autoneg off duplex full

But that didn’t make a difference, so I turned autonegotiation back on. I also read that it is possible that when the SFP modules don’t receive enough power, they will not appear and therefore, I am wondering if it is my PC hardware which doesn’t like the Mellanox card or SFP modules?

My PC has an older MSI B450-A PRO MAX AM4, AMD B450 mainboard with an AMD Ryzen 7 3700X. I placed the network card into the PCIe 3.0 slot and moved the Graphics card to the PCIe 2.0 slot for testing. I’m usually using an M.2 slot but to avoid PCIe lane switching, I removed that and used an old SATA III disk to run the OS. I’m working with Fedora Workstation 41 and Windows 10 (dual-boot) but the Mellanox card doesn’t connect in any of those.

I’m running out of ideas and don’t know what to test next. Any help is appreciated a lot :)