My ConnectX does not send or receive ethernet packets anymore

Hello,

I have a Mellanox ConnectX EN 10G (mt26448)

that I use in a Proxmox (Debian) server. It had bios version 2.7. With that bios it gave a lot of “command 0x54 failed” errors. I solved upgrading it to bios 2.9.1000

After reboot I have seen enp7s0 and enp7s0d1 devices but I was unable to ping other servers.

I have done the command service mst start and, incredibly, I was able to ping.

Now I have upgraded kernel and this trick does not work anymore nor it works going back to older kernel.

I suppose the card is not broken but it is only a software configuration problem.

Unfortunately I cannot change easily the card because the server is in an housing far away.

I hope I can reenable ports via software.

Can you help me?

Thanks,

Mario

Looking at ethtool output I see this:

Settings for enp7s0:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseT/Full

Supported pause frame use: No

Supports auto-negotiation: No

Advertised link modes: 10000baseT/Full

Advertised pause frame use: No

Advertised auto-negotiation: No

Speed: 10000Mb/s

Duplex: Full

Port: FIBRE

PHYAD: 0

Transceiver: internal

Auto-negotiation: off

Supports Wake-on: d

Wake-on: d

Current message level: 0x00000014 (20)

link ifdown

Link detected: yes

The card says link ifdown. The card is attached using DAC, not FIBRE.

Hello Mario -

lspci | Grep Mellanox # get bus:dev.func

NOTE: Use your device (bus:dev.func) in place of “85:00.0”

lspci -s 85:00.0 -xxxvvv

cat /etc/release

uname -r

mlxlink -d 85:00.0 -c -e --show_fec -m --show_serdes_tx --show_device

mlxlink -d 85:00.0

mlxlink -d 85:00.0 --show_counters

mlxlink -d mlx5_0 -p 1 -emc

ethtool -i

get fw version from the nic:

ethtool

ethtool -m

Thanks

~Steve

Many thanks!

Here it is:

07:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0)

Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-

Latency: 0, Cache Line Size: 64 bytes

Interrupt: pin A routed to IRQ 37

NUMA node: 0

Region 0: Memory at df300000 (64-bit, non-prefetchable) [size=1M]

Region 2: Memory at d5000000 (64-bit, prefetchable) [size=8M]

Expansion ROM at df200000 [disabled] [size=1M]

Capabilities: [40] Power Management version 3

Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Capabilities: [48] Vital Product Data

Product Name: Hawk Dual Port

Read-only fields:

[PN] Part number: 59Y1905

[EC] Engineering changes: A1

[SN] Serial number: YK50200000EB

[V0] Vendor specific: PCIe Gen2 x8

[RV] Reserved: checksum good, 0 byte(s) reserved

Read/write fields:

[V1] Vendor specific: N/A

[YA] Asset tag: N/A

[RW] Read-write area: 106 byte(s) free

End

Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

Vector table: BAR=0 offset=0007c000

PBA: BAR=0 offset=0007d000

Capabilities: [60] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W

DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

MaxPayload 256 bytes, MaxReadReq 512 bytes

DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited

ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-

Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

Compliance De-emphasis: -6dB

LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

ARICap: MFVC- ACS-, Next Function: 1

ARICtl: MFVC- ACS-, Function Group: 0

Kernel driver in use: mlx4_core

Kernel modules: mlx4_core

00: b3a0

10: 04 00 30 df0c 00 00 d5

20:00b3 15 50 67

30: 00 00 20 df00 0f 01 00 00

40:00 03 9c ff 7f

50:00 14 00 0f 00 90 01 a0 00

60:8e 64 10 2ef4 03 08

70:

80:1f

90:007f 80

a0: 00 c0 07 00 00 d08a

b0:0000

c0:0000

d0:0000

e0:0000

f0:0000


RETTY_NAME=“Debian GNU/Linux 9 (stretch)”

NAME=“Debian GNU/Linux”

VERSION_ID=“9”

VERSION=“9 (stretch)”

ID=debian

HOME_URL=“https://www.debian.org/

SUPPORT_URL=“https://www.debian.org/support

BUG_REPORT_URL=“https://bugs.debian.org/


4.15.18-11-pve


Please note it is a Proxmox 5.3 (Debian based)

Hello Mario -

the ethtool outputs are missing:

ethtool -i

get fw version from the nic:

ethtool

ethtool -m

Thanks - steve

Sorry I did not see part of your reply.

Anyway the problem is that I had to install mft 3.8.0 because it is the only one that yet compile with recente kernels and that recognizes my card. And mft 3.8.0 does not have mlxlink command.

Here is the data:

flint -d /dev/mst/mt26448_pciconf0 query

Image type: FS2

FW Version: 2.9.1000

Rom Info: type=PXE version=1.5.5 devid=26448 proto=ETH

Device ID: 26448

Description: Port1 Port2

MACs: 0002c90827d8 0002c90827d9

VSD:

PSID: IBM0050000010


ethtool -i enp7s0

driver: mlx4_en

version: 4.5-1.0.1

firmware-version: 2.9.1000

expansion-rom-version:

bus-info: 0000:07:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes


ethtool enp7s0

Settings for enp7s0:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseT/Full

Supported pause frame use: No

Supports auto-negotiation: No

Advertised link modes: 10000baseT/Full

Advertised pause frame use: No

Advertised auto-negotiation: No

Speed: 10000Mb/s

Duplex: Full

Port: FIBRE

PHYAD: 0

Transceiver: internal

Auto-negotiation: off

Supports Wake-on: d

Wake-on: d

Current message level: 0x00000014 (20)

link ifdown

Link detected: yes


ethtool -m enp7s0

Cannot get module EEPROM information: Input/output error

Thanks again,

Mario

Hello Mario -

You would have to take this with IBM support as the NIC is flashed with IBM FW.

PSID: IBM0050000010

I hope this helps…

~Steve

I understand I cannot receive support from you because it is an IBM OEM card. But I want to share what I have discovered: if, after reboot, I detach and reattach cables it starts working.

Have you a little hint to share with me about this