100GbE cable not communicating with Cisco 9508

We have a Cisco 9508 switch, with 100Gb line cards and Mellanox MCX515A-CCAT Interface cards, and cables: MFA1A00–C010.

The Cable fails to work and the issue follows the cable when swapping between two hosts at the Host NIC port, and when switching between two Switch ports on the 9508 switch.

show int brief

shows “link not connected”.

Hosts are running CentOS 7.6 and

MLX_OFED_ LINUX-4.6_1.0.1.1-rhel7.6-x86_64 drivers installed.

We had a collection of 4-5 cables that were “bad” and we were ready to return them, but then we reloaded the Cisco 9508 switch, and several of the cables started working again. But then two other cables that were working correctly started giving errors after the Cisco 9508 switch reload.

Why would a cable behave this way?

And: why am I asking here? (because cisco asked me to contact Mellanox).

Hi James,

I would like to bring to your notice that based on the Release Notes of latest FW version 16.25.1020, the supported/tested 100GbE from Cisco are as follows. Hence, in such a scenario, we cannot guarantee if it is intended to work/not.

http://www.mellanox.com/pdf/firmware/ConnectX5-FW-16_25_1020-release_notes.pdf -------> Section 1.3.3 Tested 100GbE Switches

Speed SwitchSilicon OPN # / Name Description Vendor

100GbE N/A 93180YC-EX 48 x 10/25-Gbps fiber ports and 6 x 40/100-Gbps Quad Small Form-Factor Pluggable 28 (QSFP28) ports Cisco

100GbE N/A C3232C High-Density, 100 Gigabit Ethernet Switch Cisco

However, in order to assist in the best possible way, please provide the following information:

A. I would like to clarify: The two cables which are not working now are from the same 4-5 cables that were not working before and started working after the switch was reloaded?

B. Was there any other change after the switch reload?

C. Have you tested by connecting just two ports on HCA eliminating the switch? That is, either connecting HCA P1 from one host to HCA P1 on other host or if you have any other vendor card in the host, then checking with Mellanox HCA to other vendor HCA?

D. From where was the cable and card purchased?

E. Please provide the following output as I would like to validate the FW version of the card:

#[mst start]​

#mst status (To get MST device)

#flint -d q

F. mlxlink -d -emc

G. ethtool

Thanks,

Namrata.

Hi James,

I would like to bring to your notice that based on the Release Notes of latest FW version 16.25.1020, the supported/tested 100GbE from Cisco are as follows. Hence, in such a scenario, we cannot guarantee if it is intended to work/not.

http://www.mellanox.com/pdf/firmware/ConnectX5-FW-16_25_1020-release_notes.pdf -------> Section 1.3.3 Tested 100GbE Switches

Speed SwitchSilicon OPN # / Name Description Vendor

100GbE N/A 93180YC-EX 48 x 10/25-Gbps fiber ports and 6 x 40/100-Gbps Quad Small Form-Factor Pluggable 28 (QSFP28) ports Cisco

100GbE N/A C3232C High-Density, 100 Gigabit Ethernet Switch Cisco

However, in order to assist in the best possible way, please provide the following information:

A. I would like to clarify: The two cables which are not working now are from the same 4-5 cables that were not working before and started working after the switch was reloaded?

B. Was there any other change after the switch reload?

C. Have you tested by connecting just two ports on HCA eliminating the switch? That is, either connecting HCA P1 from one host to HCA P1 on other host or if you have any other vendor card in the host, then checking with Mellanox HCA to other vendor HCA?

D. From where was the cable and card purchased?

E. Please provide the following output as I would like to validate the FW version of the card:

#mst start​

#mst status (To get MST device)

#flint -d q

F. mlxlink -d -emc

G. ethtool

Thanks,

Namrata.

Hello. Answers below

A. Yes

B. No

C. No

D. A vendor in SD that I am unwilling to reveal here. We were told we have no warranty coverage on the ConnectX-5 cards or the Cables.

E, F, G. I cannot provide this output, but I am working to update the firmware on all our cards using the MFT tools. I will report back more later.

Updated Answers as of Oct 10th:

C: yes, after testing 3 cases connecting One Host to another Host with a Mellanox Cable worked successfully with Ping and ssh. the results from “mlxlink” and “mlxcables” and “ethtool” were positive.

E: The FIrmware has been updated to the latest rev from the RHEL7.7 software update package on at least the three hosts reporting problems so far. All other hosts have been updated to Firmware found in the RHEL 7.6 software update package.

I will prepare some more detailed results from those commands (E,F,G) and post them here when I can.

I am enclosing the command results from E,F,G. Thank you for your help. The two hosts in these command results are running RHEL 7.7 installed from DVD plus the MLX RHEL 7.7 drivers freshly downloaded from MLX website. I have some results from my “host-to-host” connection testing, which works, and some other results from “host-to-switch” testing, which still does not work for these two hosts. Thanks for your help.

I have attached several files to my last post, but only one file appears. sigh.

File #3: host to switch ethtool results: not working

File #4: host command mst status does not produce expected results

File #5: host-to-host connections work fine: mlxlink results

I posted some command results below on Oct 15th. I welcome your feedback and recommended steps to resolve these issues with our Mellanox 100Gb Cables.

Hello. I appreciate your help and I look forward to your recommendations to resolve this issue that we face with our Mellanox Cables.

Hi James,

Correct me if I am wrong, when you connected the same cable back-to-back on host there is no issue but when you connect the same cable via switch, you face an issue, correct? If yes, then it seems to be issue pointing to the switch.

To check further, could you please provide the outputs again, differentiating the scenario as well as provide complete output along with the command executed so that I can understand more clearly. You may paste the outputs here if that is more feasible rather than attaching files.

To summarize, please provide following in following situations:

Situation 1: Connect the ports back-to back without switch and provide:

  1. #mlxlink -d -emc
  2. #ethtool -i
  3. #[ethtool ]​

Situation 2: Connect the HCA via the switch and provide:

  1. #mlxlink -d -emc
  2. #ethtool -i
  3. #[ethtool ]​

Also, has the speed been set correctly on the switch? Is the switch configured with FEC? If yes, which FEC?

Thanks,

Namrata.

Hi James,

I see you have a Support Contract with us and hence I converted this community ticket to support ticket and I will re-send my update from the case.

Thanks,

Namrata.