Multiple ConnectX-3 cards, no fiber light, no link light

I have an issue with my Mellanox ConnectX-3 cards where any transceivers don’t output any light.
The card is detected in Windows 11 and Ubuntu 22.10. Both ports are visible on both operating systems, but I can never get the transceivers to work, and the link lights never turn on.

I’ve updated the firmware and drivers to the latest versions. I’ve tested different firmwares. I’ve reinstalled the firmwares and drivers multiple times.
I’ve even tried two different ConnectX-3 cards, of the same model.

I’ve verified all the transceivers work and the fiber is good.
I can’t get any light from the transceivers. My light meter doesn’t detect anything.

Card Models: Mellanox MCX314A-BCBT
Driver Version: 5.50.14740.1
Firmware Version: 2.42.5000

Tested Transceivers: Cisco QSFP-40G-SR-BD, Cisco SFP-10G-SR
Fiber Type: LC-LC OM3 MMF

OS: Windows 11 Pro for Workstation 23493, Ubuntu 22.10
Tested Multiple PCIe slots as well.

I have no idea where to troubleshoot at this point. The cards seem to be detected fine, and even updated fine, so I don’t think they’re broken. The firmware release notes also specify it’s compatible with my 40Gig transceivers.

ConnectX-3 cards are EOL.

You can try opening a support ticket and get the FW dumps extracted from the cards to try and explain why the modules aren’t being powered on (as it seems from your description).

I got 2 known-working Mellanox MFM1T02A-SR modules and tested them in both cards on both servers running Windows and Ubuntu, and they’re doing the same thing; appearing to not power on at all.

How would I go about doing this? Thank you for the reply!

I think you need a support contract for opening a case with support.

what happens with copper cables? Do you get a link up?
if that is the case – it may be that the PCIe slots to which the cards are connected aren’t providing sufficient power.

What is the platform (motherboard etc.) you are connecting the card/s to?

That’s unfortunate, but I understand. Thank you for helping me this far!

DAC cables come in Saturday, so I’ll be able to test those then.

The three platforms I’ve tried have been AMD X399, AMD B550, and Intel Z790
I’ve messed with the PCIe power settings, tried different slots, tried both PCIe 2.0 and 3.0. Manually set v3.0 on the slot. Toggled SR-IOV. And none of those factors appear to have any effect on the modules not turning on.

The only reason for them not turning on will be the power or compatibility.

do you happen to have the linux messages/dmesg output?

I’m now testing an Intel Z170 machine running TrueNAS-22.12.2 (Debian 11)

Here’s some potentially helpful command outputs for the ConnectX-3 card’s status.
This card is running the 314A-BCB (ETH Only) firmware. I’ve tested it an MCX354A-FCB VPI firmware on the cards.

root@truenas[~]# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:07:00.0 bar=0xdf600000 size=0x100000
                                   Chip revision is: 01
/dev/mst/mt4099_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:07:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 01
root@truenas[~]# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3
  Part Number:      MCX314A-BCB_Ax
  Description:      ConnectX-3 EN network interface card; 40GigE; dual-port QSFP; PCIe3.0 x8 8GT/s; RoHS R6
  PSID:             MT_1090110023
  PCI Device Name:  /dev/mst/mt4099_pci_cr0
  Port1 MAC:        e41d2d2cf340
  Port2 MAC:        e41d2d2cf341
  Versions:         Current        Available
     FW             2.42.5000      N/A
     PXE            3.4.0752       N/A

  Status:           No matching image found
root@truenas[~]# mlxlink -d /dev/mst/mt4099_pci_cr0

-E- Device is not supported

This is the dmesg output. (hit a character limit)
root@truenas[~]# dmesg[ 0.000000] microcode: microcode updated early to rev - Pastebin.com

Edit: The two lines at the bottom, starting with MFT device name created: id: 4099, are after I ran the mst start command.

FYI - mlxlink won’t work on it (only CX4 onwards)

Can you post the messages & dmesg files anywhere accessible when cables are connected?

Not sure if this NIC generation has it – but it should report if it fails to activate the cables.

you can also attach mstdump output from the device – I’ll try to review and see why the modules aren’t being turned on.

mstdump -full /dev/mst/mt4099_pci_cr0

Also ibstat and ethtool output from the machine may help.

ibstat

ethtool -m <interface_name>

ethtool <interface_name>

thanks,

Dan

Thanks! Someone suggested I run it on the L1Techs forum post for this.

The previous dmesg was with two SFP+ 10gig mellanox modules plugged in with an sc-mmf cable connecting the two together.

mt4099.dmp (764.2 KB)

I’m unable to install the infiniband-diags package for the ibstat command, as TrueNAS doesn’t support adding packages, but I’m sure I can work around that if needed.

ethtool output for each interface: (The 10Gig mellanox SFP+ modules appear to be detected)

root@truenas[/tmp]# ethtool -m enp7s0
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : MELLANOX
        Vendor OUI                                : 00:02:c9
        Vendor PN                                 : AFBR-703SDZ-MX1
        Vendor rev                                : G2.3
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : AA1131A5WB0
        Date code                                 : 110805
        Optical diagnostics support               : Yes
        Laser bias current                        : 0.002 mA
        Laser output power                        : 0.0001 mW / -40.00 dBm
        Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
        Module temperature                        : 37.14 degrees C / 98.86 degrees F
        Module voltage                            : 3.3153 V
        Alarm/warning flags implemented           : Yes
        Laser bias current high alarm             : Off
        Laser bias current low alarm              : Off
        Laser bias current high warning           : Off
        Laser bias current low warning            : Off
        Laser output power high alarm             : Off
        Laser output power low alarm              : Off
        Laser output power high warning           : Off
        Laser output power low warning            : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser rx power high alarm                 : Off
        Laser rx power low alarm                  : On
        Laser rx power high warning               : Off
        Laser rx power low warning                : On
        Laser bias current high alarm threshold   : 10.500 mA
        Laser bias current low alarm threshold    : 2.500 mA
        Laser bias current high warning threshold : 10.500 mA
        Laser bias current low warning threshold  : 2.500 mA
        Laser output power high alarm threshold   : 2.0000 mW / 3.01 dBm
        Laser output power low alarm threshold    : 0.1260 mW / -9.00 dBm
        Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
        Laser output power low warning threshold  : 0.3170 mW / -4.99 dBm
        Module temperature high alarm threshold   : 85.00 degrees C / 185.00 degrees F
        Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
        Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 3.6000 V
        Module voltage low alarm threshold        : 3.0000 V
        Module voltage high warning threshold     : 3.4600 V
        Module voltage low warning threshold      : 3.1300 V
        Laser rx power high alarm threshold       : 2.0000 mW / 3.01 dBm
        Laser rx power low alarm threshold        : 0.0315 mW / -15.02 dBm
        Laser rx power high warning threshold     : 0.7900 mW / -1.02 dBm
        Laser rx power low warning threshold      : 0.0315 mW / -15.02 dBm
root@truenas[/tmp]# ethtool -m enp7s0d1
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : MELLANOX
        Vendor OUI                                : 00:02:c9
        Vendor PN                                 : AFBR-703SDZ-MX1
        Vendor rev                                : G2.3
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : AA1131A5WAV
        Date code                                 : 110805
        Optical diagnostics support               : Yes
        Laser bias current                        : 0.002 mA
        Laser output power                        : 0.0001 mW / -40.00 dBm
        Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
        Module temperature                        : 36.30 degrees C / 97.33 degrees F
        Module voltage                            : 3.3128 V
        Alarm/warning flags implemented           : Yes
        Laser bias current high alarm             : Off
        Laser bias current low alarm              : Off
        Laser bias current high warning           : Off
        Laser bias current low warning            : Off
        Laser output power high alarm             : Off
        Laser output power low alarm              : Off
        Laser output power high warning           : Off
        Laser output power low warning            : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser rx power high alarm                 : Off
        Laser rx power low alarm                  : On
        Laser rx power high warning               : Off
        Laser rx power low warning                : On
        Laser bias current high alarm threshold   : 10.500 mA
        Laser bias current low alarm threshold    : 2.500 mA
        Laser bias current high warning threshold : 10.500 mA
        Laser bias current low warning threshold  : 2.500 mA
        Laser output power high alarm threshold   : 2.0000 mW / 3.01 dBm
        Laser output power low alarm threshold    : 0.1260 mW / -9.00 dBm
        Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
        Laser output power low warning threshold  : 0.3170 mW / -4.99 dBm
        Module temperature high alarm threshold   : 85.00 degrees C / 185.00 degrees F
        Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
        Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 3.6000 V
        Module voltage low alarm threshold        : 3.0000 V
        Module voltage high warning threshold     : 3.4600 V
        Module voltage low warning threshold      : 3.1300 V
        Laser rx power high alarm threshold       : 2.0000 mW / 3.01 dBm
        Laser rx power low alarm threshold        : 0.0315 mW / -15.02 dBm
        Laser rx power high warning threshold     : 0.7900 mW / -1.02 dBm
        Laser rx power low warning threshold      : 0.0315 mW / -15.02 dBm

As always, thank you so much for helping!

Good news! Maybe…

My SFP+ 10gig DAC cables just arrived (10GTek CAB-10GSFP-P3M). I plugged them into my Mellanox ConnectX-3 QSFP+ 40Gig card using the QSFP to SFP adapters and link came right up.

So…
The card works. It can recognize and print out any transceiver I put in it, but it never puts light out of the transceiver. They stay off. That happens with BOTH Cisco and Mellanox modules.

I’m at a complete loss as to what would be causing this and what the next troubleshooting step would be.
This is happening to BOTH of my Mellanox 314A 40Gig cards on any motherboard and any operating system.

Can you please share the motherboard details?

  • As mentioned – might be an issue with power from the motherboard itself…
  • Will be good to know to which slot the cards are connected per motherboard

Currently using an NZXT N7 B550.
It’s currently in the PCI-E 16x slot, with the version manually set to 3.0. It should be outputting the full 75 watts capable.
I’ve tried its 3.0 4x slot as well with the same results.

I’ve also tried an ASUS Z170 Pro

I don’t have my ASUS ROG STRIX X399-E GAMING motherboard available anymore, but it was doing the same thing as well in the top PCIe 16x slot.

Did you get a chance to do this? I really appreciate all the time you’ve spent helping me thus far.

I’m looking at purchasing ConnectX-3 354A cards to replace these 314A cards to see if it makes a difference.

The ConnectX-3 354A cards came in.
They’re doing the EXACT same thing as the 314A cards.
Works totally fine with DAC cables, but in any system at all, any fiber module plugged in never turns on its led/laser, and there’s never link light.

I’m at a complete loss on what to do from here anymore.

I’m currently at the point where I think these cards only support 1.5 watts per port, and I think all these optics have been 2.5 watts and 3.5 watts.

I don’t know if there any way to override the max power per port?
I have very active cooling on the card.

Tried the CX314 and CX354 cards in a Supermicro MBDH12SSLNTO Motherboard with an EPYC 7763 and still had the same exact issue.

Went ahead and returned all the CX3 cards and bought some CX4 cards. Going to see if they make a difference, especially since I’m using BIDI transceivers.