Hi there!
I recently bought my first fiber network card, the Mellanox ConnextX-4 Lx EN for my PC and two SFP28 modules from fs.com to connect with my Mikrotik CCR2004 router. Not sure if it matters but the cable I used is a 40m Lightwin Singlemode OS2 Duplex Patch Cable 9/125μm with LC-LC plugs. After I plugged everything in, the network interface immediately appeared… but tells me that the “Cable is unplugged”.
First i thought it is a cable issue so I double-checked that everything is correctly plugged in (the transceivers, the network card, the cables) and also swapped the cable because I had a second one which is working (and used to connect a Ubiquiti Aggregation Switch).
My next thought was that maybe one of the SFP28 modules is broken but the Mikrotik Router detects both of them when plugged in. I can also see their presence on my PC when using ethtool or mlxlink:
$ mlxlink -d 26:00.0 -m -e -c
Operational Info
----------------
State : Polling
Physical state : ETH_AN_FSM_ABILITY_DETECT
Speed : N/A
Width : N/A
FEC : N/A
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed : 0x38007013 (25G,10G,1G)
Supported Cable Speed : 0x20000000 (25G)
Troubleshooting Info
--------------------
Status Opcode : 36
Group Opcode : PHY FW
Recommendation : Other issues
Tool Information
----------------
Firmware Version : 14.32.1900
MFT Version : mft 4.30.0-139
Module Info
-----------
Temperature [C] : 45 [-50..95]
Voltage [mV] : 3262.8 [2970..3630]
Bias Current [mA] : 40.756 [10..100]
Rx Power Current [dBm] : -2 [-16..6]
Tx Power Current [dBm] : -2 [-10..6]
Identifier : SFP28/SFP+
Compliance : 100GBASE-LR4 or 25GBASE-LR
Cable Technology : N/A
Cable Type : Optical Module (separated)
OUI : Other
Vendor Name : FS
Vendor Part Number : SFP28-25GLR-31
Vendor Serial Number : G2220463087
Rev : A2
Wavelength [nm] : 1310
Transfer Distance [m] : 0
Attenuation (5g,7g,12g)[dB] : N/A
FW Version : N/A
Digital Diagnostic Monitoring : Yes
Power Class : 1.0 W max
CDR RX : ON
CDR TX : ON
LOS Alarm : N/A
SNR Media Lanes [dB] : N/A
SNR Host Lanes [dB] : N/A
IB Cable Width : N/A
Memory Map Revision : 0
Linear Direct Drive : 0
Cable Breakout : N/A
SMF Length : N/A
MAX Power : 0
Cable Rx AMP : 0
Cable Rx Emphasis : 16
Cable Rx Post Emphasis : 0
Cable Tx Equalization : 32
Wavelength Tolerance : 0.0nm
Module State : N/A
DataPath state [per lane] : N/A
Rx Output Valid [per lane] : 0
Nominal bit rate : 25.750Gb/s
Rx Power Type : Average power
Manufacturing Date : 15_06_22
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code : N/A
Error Code Response : N/A
Module FW Fault : N/A
DataPath FW Fault : N/A
Tx Fault [per lane] : 0
Tx LOS [per lane] : N/A
Tx CDR LOL [per lane] : 0
Rx LOS [per lane] : 0
Rx CDR LOL [per lane] : 0
Tx Adaptive EQ Fault [per lane] : N/A
EYE Opening Info
----------------
Physical Grade : 0
Height Eye Opening [mV] : 0
Phase Eye Opening [psec] : 0
Physical Counters and BER Info
------------------------------
Time Since Last Clear [Min] : N/A
Effective Physical Errors : N/A
Raw Physical Errors Per Lane : N/A
Effective Physical BER : N/A
Raw Physical BER : N/A
Link Down Counter : N/A
Link Error Recovery Counter : N/A
Interestingly the troubleshooting info shows that something seems to be wrong but I couldn’t find a lot of information about what that means:
Status Opcode : 36
Group Opcode : PHY FW
Recommendation : Other issues
My next idea was that maybe something is wrong with the firmware so I tried to reset the config and firmware:
$ mlxconfig -d 26:00.0 reset
Reset configuration for device 26:00.0? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.
$ mlxfwreset -d 26:00.0 reset
The reset level for device, /dev/mst/mt4117_pciconf0 is:
3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
-I- Stopping Driver -Done
-I- Resetting PCI -Done
-I- Starting Driver -Done
-I- Restarting MST -Done
-I- FW was loaded successfully.
After restarting my machine, still no connection. I then checked and updated the firmware using mlxfwmanager:
$ mlxfwmanager -d /dev/mst/mt4117_pciconf0 -u -i ./fw-ConnectX4Lx-rel-14_32_1900-MCX4121A-ACA_Ax-UEFI-14.25.17-FlexBoot-3.6.502.bin
$ mlxfwmanager -d /dev/mst/mt4117_pciconf0
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034
PCI Device Name: /dev/mst/mt4117_pciconf0
Base MAC: 98039b7930c1
Versions: Current Available
FW 14.32.1900 14.32.1900
PXE 3.6.0502 3.6.0502
UEFI 14.25.0017 14.25.0017
Status: Up to date
And verified the image with flint:
$ flint -d /dev/mst/mt4117_pciconf0 v
FS3 failsafe image
/0x00800038-0x00801e8b (0x001e54)/ (BOOT2) - OK
/0x00802000-0x0080201f (0x000020)/ (ITOC_HEADER) - OK
/0x00804000-0x0081472b (0x01072c)/ (IRON_PREP_CODE) - OK
/0x00815000-0x008150ff (0x000100)/ (RESET_INFO) - OK
/0x00816000-0x00816bff (0x000c00)/ (FW_MAIN_CFG) - OK
/0x00817000-0x008174bf (0x0004c0)/ (FW_BOOT_CFG) - OK
/0x00818000-0x008195ff (0x001600)/ (HW_MAIN_CFG) - OK
/0x0081a000-0x0081a13f (0x000140)/ (HW_BOOT_CFG) - OK
/0x0081b000-0x0081dc7f (0x002c80)/ (PHY_UC_CONSTS) - OK
/0x0081e000-0x0081e13f (0x000140)/ (IMAGE_SIGNATURE_256) - OK
/0x0081f000-0x0081f8ff (0x000900)/ (PUBLIC_KEYS_2048) - OK
/0x00820000-0x0082008f (0x000090)/ (FORBIDDEN_VERSIONS) - OK
/0x00821000-0x0082123f (0x000240)/ (IMAGE_SIGNATURE_512) - OK
/0x00822000-0x008230ff (0x001100)/ (PUBLIC_KEYS_4096) - OK
/0x00824000-0x00873fff (0x050000)/ (PROGRAMMABLE_HW_FW) - OK
/0x00874000-0x00925e27 (0x0b1e28)/ (ROM_CODE) - OK
/0x00926000-0x00935fff (0x010000)/ (CRDUMP_MASK_DATA) - OK
/0x00936000-0x009369ff (0x000a00)/ (PCIE_PHY_UC_CODE) - OK
/0x00937000-0x0094091f (0x009920)/ (PHY_UC_CODE) - OK
/0x00941000-0x0096c457 (0x02b458)/ (PCI_CODE) - OK
/0x0096d000-0x00cbb8bf (0x34e8c0)/ (MAIN_CODE) - OK
/0x00cbc000-0x00cc9cbf (0x00dcc0)/ (PCIE_LINK_CODE) - OK
/0x00cca000-0x00ccae3f (0x000e40)/ (POST_IRON_BOOT_CODE) - OK
/0x00ccb000-0x00ccce0f (0x001e10)/ (UPGRADE_CODE) - OK
/0x00ccd000-0x00ccd3ff (0x000400)/ (IMAGE_INFO) - OK
/0x00ccd400-0x00ccdbcb (0x0007cc)/ (DBG_FW_INI) - OK
/0x00ccdbcc-0x00ccdbd3 (0x000008)/ (DBG_FW_PARAMS) - OK
/0x00fa0000-0x00faffff (0x010000)/ (NV_DATA) - CRC IGNORED
/0x00fb0000-0x00fbffff (0x010000)/ (NV_DATA) - CRC IGNORED
/0x00fc0000-0x00fcffff (0x010000)/ (FW_NV_LOG) - CRC IGNORED
/0x00fee000-0x00fee1ff (0x000200)/ (DEV_INFO) - OK
/0x00ff8000-0x00ff813f (0x000140)/ (MFG_INFO) - OK
/0x00ff8140-0x00ff81b7 (0x000078)/ (VPD_R0) - OK
-I- FW image verification succeeded. Image is bootable.
Nothing changed. Other people mentioned that playing with the autonegotiation and FEC settings might help, so disabled autonegotiation on my Mikrotik router and played with the FEC settings. I also tried to enforce those settings on the Mellanox card with ethtool:
$ ethtool -s enp38s0f0np0 speed 25000 autoneg off duplex full
But that didn’t make a difference, so I turned autonegotiation back on. I also read that it is possible that when the SFP modules don’t receive enough power, they will not appear and therefore, I am wondering if it is my PC hardware which doesn’t like the Mellanox card or SFP modules?
My PC has an older MSI B450-A PRO MAX AM4, AMD B450 mainboard with an AMD Ryzen 7 3700X. I placed the network card into the PCIe 3.0 slot and moved the Graphics card to the PCIe 2.0 slot for testing. I’m usually using an M.2 slot but to avoid PCIe lane switching, I removed that and used an old SATA III disk to run the OS. I’m working with Fedora Workstation 41 and Windows 10 (dual-boot) but the Mellanox card doesn’t connect in any of those.
I’m running out of ideas and don’t know what to test next. Any help is appreciated a lot :)