Link not establishing in one ConnectX-4 LX 40/10GbE card

Tested using two ConnectX-4 LX 40GbE cards, then also link not established between them. Both cards installed in i7 pc’s and both running Ubuntu 16.04 with kernel version 4.15.0-50-generic. 40G/10G Cable used to connect cards is working fine as I verified using an Intel 40/10G card. Earlier this board was working fine and don’t know what happened to it now.

I have performed following tests.

mlx_issue (4)

Attaching the lshw, ifconfig, lspci, ethtool, dmesg output of board having issue. From the ouputs , card is being detected by the machine, but link is not being established. Can anyone please go through the observations and let us know is this a HW/SW issue and how can we fix this issue?.

Hi Ananthu,

Have you tried to install latest Mellanox OFED Drivers version 4.6 (embedded newer FW) versus an outdated version 4.3.1.

I would suggest if not, to upgrade to the latest/greatest and validate if this issue reproduced. What type of server the HCA card is installed? (vendor/model).

Even if the card is being detected by the HW/Driver, I would suggest as well to power cycle the server to re-initialize the HCA card.

Sophie.

Hi Sophie,

Thank you for your reply. I am not able to suspect the setup because I have got link established when I used another ConnectX-4 LX 40/10GbE card in the same setup. Earlier, the card with the issue was also working in the same setup but don’t know what happened to it now.

The card is detecting properly as you can see from my screenshots, but the only problem is that link not being established.

Can u please mention any registers to check the link is detected or any other debugging method to confirm whether it is an HW issue or not.

The ConnectX-4 card is installed in a Intel i7 pc having model name “Intel(R) Core™ i7-8700K CPU @ 3.70GHz”. We have tried several reboots and power cycles, but the issue didn’t resolve.

Hi Ananthu,

Does the other ConnectX-4 LX HCA card has the same FW 14.22.1002? (latest 14.25.1020).

Have you tried to swap slot between the working and non working ConnectX-4?

Any errors reported in the messages/dmesg files?

Have you tried this HCA card into a different server (same and different model)?

Did you consult with the vendor if our ConnectX-4 LX cards have been qualified with these servers (Desktop)?

Sophie.

Hi Sophie,

Please see my answers to your queries.

Does the other ConnectX-4 LX HCA card has the same FW 14.22.1002? (latest 14.25.1020).

No, other board has firmware version 14.25.1020.

Have you tried to swap slot between the working and non working ConnectX-4?

Yes

Any errors reported in the messages/dmesg files?

enp1s0 is the mellanox device. You can see i am getting only link is not ready message only.

Have you tried this HCA card into a different server (same and different model)?

Yes

Did you consult with the vendor if our ConnectX-4 LX cards have been qualified with these servers (Desktop)?

No, but this HCA card(with issue) worked earlier in this setup and another HCA card is still working in this setup with no issues.

Hi Anathu,

Does the other ConnectX-4 LX HCA card has the same FW 14.22.1002? (latest 14.25.1020).

No, other board has firmware version 14.25.1020.

#Make sure the ConnectX-4 LX problematic HCA is aligned with latest FW (14.25.1020) and Mellanox

OFED Driver version 4.6 (latest).

Have you tried to swap slot between the working and non working ConnectX-4?

Yes

#Does the issue follow the slot or the HCA card?

Have you tried this HCA card into a different server (same and different model)?

Yes

#Does the issue remains the same?

Sophie.