Multiple issues with ConnectX-3 CX311A-XCAT firmware.

I recently aquired three CX311A-XCAT adapters and I’m trying to get CoreOS (Container Linux) iPXE boot to work on these cards.

They came with FlexBoot 3.3.950, when I attempt to boot CoreOS using the configuration on their site I get:

iPXE> chain http://redacted

http://redacted … ok

DHCP (net0 redacted)… ok

http://stable.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz… ok

http://stable.release.core-os.net/amd64-usr/current/coreos_production_pxe.image.cpio.gz… No space left on device (http://ipxe.org/34012006)

Could not boot: No space left on device (http://ipxe.org/34012006)

iPXE>

This error seems to indicate that there is no memory left to download the image, which is strange because the machine is equipped with >32GB and the images themselves combined are less than 1GB.

It appears that the error no longer exists in the iPXE source code and that this is an ancient version of iPXE.

I decided that it would be best to update the cards to the latest firmware (FlexBoot v3.4.752).

I backed up the old firmware and made sure it could be successfully restored before flashing the cards.

Now when I enter the iPXE shell by pressing CTRL+B → ESC → Discard changes → Exit to shell

I get the following obscure error message:

FlexBoot> dhcp

ConnectX3 0x31ae4 command 0x11 failed with status 01:

ConnectX3 0x31ae4 could not write MTT at 200

ConnectX3 0x31ae4 port 1 could not create completion queue

Could not open net0: Not enough space (http://ipxe.org/31724006)

FlexBoot>

According to information on the error on the iPXE site the possible sources for this error are in drivers/infiniband/hermon.c. Infiniband? This is an Ethernet-only card. Is it misconfigured?

Then I tried to obtain the configuration using

mlxconfig -d /dev/mst/mt4099_pciconf0 query

which yields:

Device #1:

----------

Device type: ConnectX3

Device: /dev/mst/mt4099_pciconf0

Configurations: Next Boot

-E- Failed to query device current configuration

Then I found this unresolved github issue with the same problem:

https://github.com/Mellanox/mstflint/issues/33

I have tried the solutions in the linked posts, none of them worked.

Back in the FlexBoot config menu I tried to edit some settings, even though I couldn’t find anything related to memory or infiniband in the config menu (also the diagnostics menu entry is missing), but this gives me the following error:

Saving settings … Setting boot_protocol couldn’t be saved - Permission denied (http://ipxe.org/0221203c)

Things I’ve tried:

  • I’ve tested all the cards, they all act the same

  • Reflashing the firmware

  • Using a different OS to flash the firmware (CentOS, Ubuntu Server 18.04)

  • Use the alternative available firmware (FlexBoot v3.4.746)

  • Four different AM3+ and AM4 mainboards

Is there somethings obvious I’m doing wrong here? Or are the cards / the firmware broken?

Downloaded the latest available FlexBoot source code (v3.4.521) and compiled it with

python2 pxebuild.py -d 4099 --debug=hermon

Then flashed it alongside firmware 2_42_5000 with

flint -d /dev/mst/mt4099_pci_cr0 --allow_rom_change brom debug1.mrom

This results in the following error messages after pressing CTRL+B (see pictures attached)

Looks like the card can’t read it’s own configuration? Are older versions of the source code available as it seems like 3.3.950 does not have this issue?

Hello Erin,

Hope you are doing well.

Unfortunately, the issue you are experiencing, is due to certain limitations of this model of adapter (CX311A-XCAT). It is as designed.

This adapter has a limited flash capacity, in this design. This means that the configuration section of this flash is blocked and is not configurable for the user. The way it exposes this limitation is through the return error you get when saving the configuration.

As it is per design, this issue cannot be resolved, and capabilities/functionalities will stay limited for this adapter. As well this adapter is EOL and no new firmware’s are on the roadmap for this adapter.

The Connect-X 4 Lx 10GbE adapter (MCX4111A-XCAT) is not experiencing this behavior and is fully functional.

Many thanks,

~Mellanox Technical Support

Thanks, is the entire ConnectX-3 series EOL? What about the ConnectX-3 VPI MCX354A-FCBT ?