CX4 links at 40G only - was "Interoperability of CX4 with SONIC switch"

Hi,

I got a Whitebox switch with SoNIC on it.

I have a CX4 (456A-ECA) that is attached to it and does not get link.

I also have aCX5 attached to the switch and that one has link.

I wonder why the x4 is not working as expected. I updated it to the latest Firmware, made sure the port is set to ETH, switched the cable with the CX5 card but no go. Card is a known good.

Any idea what else I could try?

Thanks

Device #1:


Device Type: ConnectX4

Part Number: MCX456A-ECA_Ax

Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; du al-port QSFP28; PCIe3.0 x16; ROHS R6

PSID: MT_2190110032

PCI Device Name: mt4115_pciconf0

Base MAC: 248a07b5973a

Versions: Current Available

FW 12.24.1000 N/A

PXE 3.5.0603 N/A

UEFI 14.17.0011 N/A

Device #2:


Device Type: ConnectX5

Part Number: MCX516A-CDA_Ax

Description: ConnectX-5 Ex EN network interface card; 100GbE dual-port QS FP28; PCIe4.0 x16; tall bracket; ROHS R6

PSID: MT_0000000013

PCI Device Name: mt4121_pciconf1

Base GUID: 248a070300a529c8

Base MAC: 248a07a529c8

Versions: Current Available

FW 16.18.0160 N/A

Hi,

ok, the optical cable does not specify whether its an E or a C

—Edit: Thats because its a HP Part and its an E. So still in line with your statement;) —

Still surprises me that the IB cable works fine with the CX5 but not the 4, but I will try to get an Ethernet specific one for testing.

If I were to connect to an IB switch my cables should yield 100GBe for the CX4 then, correct? Will give that a try as well

Thanks,

regards,

Thomas

Hi Thomas,

Welcome to the Mellanox Community.

Can you please provide the following information regarding the ConnectX-4

  • If MFT (Mellanox Firmware Tools) installed → # mlxconfig -d /dev/mst/mt4115_pciconf0 q | grep -i link
  • If ‘mst’ installed from Linux distro → # mstconfig -d q | grep -i link
  • Cable part-number used

Many thanks,

~Mellanox Technical Support

Hi Martijn,

thanks for picking this up.

/opt/mellanox/bin/mlxconfig -d mt4115_pciconf0 q |grep -i link

LINK_TYPE_P1 ETH(2)

LINK_TYPE_P2 ETH(2)

KEEP_ETH_LINK_UP_P1 True(1)

KEEP_IB_LINK_UP_P1 False(0)

KEEP_LINK_UP_ON_BOOT_P1 False(0)

KEEP_LINK_UP_ON_STANDBY_P1 False(0)

KEEP_ETH_LINK_UP_P2 True(1)

KEEP_IB_LINK_UP_P2 False(0)

KEEP_LINK_UP_ON_BOOT_P2 False(0)

KEEP_LINK_UP_ON_STANDBY_P2 False(0)

I am currently on ESXi so no mstconfig unfortunately - would it be beneficial to get a linux distro on it to debug?

Used cables are FRD16 U44 21/ FRD19 U46 IB P35, FRD16 U36 20/FRD 18 U48 IB P36 (and also have some 40/42 available).

I swapped them between CX4/5 to rule out a cable issue but might there be interoperability issues with those?

Thanks.

regards,

Thomas

Edit:

Just as a sidenote - updated to the most recent sonic built, no change

SONiC Software Version: SONiC.HEAD.24-fbdb256

Distribution: Debian 9.8

Kernel: 4.9.0-8-amd64

Build commit: fbdb256

Build date: Sat Feb 23 07:59:10 UTC 2019

Built by: johnar@jenkins-worker-3

An< particular reason the associated case has been closed without any comment?

As an update - the same issue happens on another whitebox OS (OcNos) …

Hi Thomas,

Can you please check if it is possible for you to use a supported cable/transceiver which is mentioned in the RN of the ConnectX-4 VPI, MCX456A-ECA_Ax → http://www.mellanox.com/pdf/firmware/ConnectX4-FW-12_24_1000-release_notes.pdf

It looks like that the cable you are using is not supported for by our f/w on the ConnectX-4 VPI.

Also the cables you are using are also not mentioned in the f/w RN of the ConnectX-5 EN.

Many thanks,

~Martijn

Ah the cable is not supported? Didn’t realize that - let me check whether I can find one of the supported ones.

Also, the guys at OcNos found out that the card links at 40G, just not at 100G - can this potentially be attributed to the cable ?

So I rechecked the cables and I provided the wrong information, they actually are

MCP1600-E003’s (happy to provide SNs for verification)

I also tested with an optical one MFA1A00,

same behavior as before, links at 40G, does not link at 100G

@Martijn van Breugel​ Any other ideas? Anything I can do to get better logs?

Hi Thomas,

The cable you are using, is an EDR-IB cable we use for our VPI cards (InfiniBand). If you can test with a Mellanox 100GbE cable, for example a MCP1600-C003 (It looks the same, but it is not).

The reason I ask, is that the Whitebox you are using is not a Mellanox switch so maybe there are compatibility issues with the switch and the cable.

The optical version, should be MFA1A00-C003 and not the EDR-IB (MFA1A00-E003) version.

When you do the loopback test with the cables on the adapters, all work normally but when connecting to a 3rd party switch, it can be problematic.

Sometimes we do recommend to use a Mellanox transceiver on the adapter side and a certified 3rd party switch transceiver on the switch side. The Mellanox transceiver for 100GbE is MMA1B00-C100D (https://www.mellanox.com/related-docs/prod_cables/PB_MMA1B00-C100D_100GbE_QSFP28_MMF_Transceiver.pdf)

Let me know, how it goes.

Cheers,

~Martijn

So, the IB test was sucessfull, it linked up just fine to an SB7800.

Ordered a C type cable.

Still weird the CX5 is working fine with the E type though…

Ok, the -C002 cable finally arrived, but unfortunately this did not help. Even with the C cable the card only links up at 40G and not at 100G

@Martijn van Breugel​ Any other ideas?

Hi Thomas,

When you connect the ConnectX-4 and ConnectX-5 back-to-back, cable link’s up at 100GbE?

Thanks,

~Martijn

Sorry for the delay, have been looking for a response for days but never seen this :(

I coincidentally checked that today with CX4 PortA to PortB and that links at 100G even with the MFA1A00-E003.

see below, used the wrong link

Hi Thomas,

What is the switch h/w you are using?

Also in a couple of days, new f/w will be released for the ConnectX-4 and ConnectX-5. The new f/w will introduce new compatibility support for cables and other third-party vendors.

Let’s see if that improves the behavior.

Any change you are able to open a support case with the whitebox h/w vendor? We do not see a lot of issue with multi-vendor connectivity.

Many thanks,

~Martijn

Hi Martijn,

its a CELESTICA SEASTONE DX010.

And that sounds good, any idea of the ETA? Also happy to test a beta :)

Thanks,

regards,

Thomas

Also,

I don’t have hw support unfortunately and I am also fairly sure they’d blame the OS instead.

What i will test is whether an interconnect of two switch ports will yield​ 100g, then it would indicate interoperability issues with the card ( which is the most likely scenario given that the cx5 works fine with any cable)

So i did a port to port connection with both the -C DAC (5<->7) and the -E optical (1<->3) and both link up fine at 100g


Ethernet Type PVID Mode Status Reason Speed Port Ctl Br/Bu

Interface Ch #


ce1/1 ETH – routed up none 100g – Br No

ce1/2 ETH – routed down IA – – No No

ce1/3 ETH – routed down IA – – No No

ce1/4 ETH – routed down IA – – No No

ce2/1 ETH – routed down PD 100g – Br No

ce2/2 ETH – routed down IA – – No No

ce2/3 ETH – routed down IA – – No No

ce2/4 ETH – routed down IA – – No No

ce3/1 ETH – routed up none 100g – Br No

ce3/2 ETH – routed down IA – – No No

ce3/3 ETH – routed down IA – – No No

ce3/4 ETH – routed down IA – – No No

ce4/1 ETH – routed down PD 100g – Br No

ce4/2 ETH – routed down IA – – No No

ce4/3 ETH – routed down IA – – No No

ce4/4 ETH – routed down IA – – No No

ce5/1 ETH – routed up none 100g – Br No

ce5/2 ETH – routed down IA – – No No

ce5/3 ETH – routed down IA – – No No

ce5/4 ETH – routed down IA – – No No

ce6/1 ETH – routed down PD 100g – Br No

ce6/2 ETH – routed down IA – – No No

ce6/3 ETH – routed down IA – – No No

ce6/4 ETH – routed down IA – – No No

ce7/1 ETH – routed up none 100g – Br No

ce7/2 ETH – routed down IA – – No No

ce7/3 ETH – routed down IA – – No No

ce7/4 ETH – routed down IA – – No No

ce8/1 ETH – routed down PD 100g – Br No

ce8/2 ETH – routed down IA – – No No

ce8/3 ETH – routed down IA – – No No

So just to sum up the current test results

Celestica DX10 (broadcom based) 3rd party switch

Cx5 EN → switch works with any cable (IB and ETH)

CX4 VPI in EN config → switch does not work with any cable

CX4 to CX4 EN works with any cable

Switch to Switch works with any cable

Looks like a CX4 issue to me. Waiting for the new FW to see if the improved compatibility will help.

I am quite surprised that the CX4’s don’t work with broadcom based switches - or maybe its just this specific one; its the only 100G ETH I have, but if somebody else could chime in re CX4 compatibility… ?