ConnectX6 adapater on Mellanox SB7790 (unmanaged EDR switch)

Greetings,
we have new nodes with Mellanox Technologies MT2892 Family [ConnectX-6 Dx] adapters and trying to connect them to an Mellanox SB7790 EDR switch.
We are running Rocky Linux 8.8 with MLNX_OFED_LINUX-5.8-3.0.7.0-rhel8.8-x86 installed.
Devices remaining in state polling.
Any hints?

# ibv_devinfo 
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         22.35.3006
        node_guid:                      e8eb:d303:0010:3e9a
        sys_image_guid:                 e8eb:d303:0010:3e9a
        vendor_id:                      0x02c9
        vendor_part_id:                 4125
        hw_ver:                         0x0
        board_id:                       MT_0000000903
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               65535
                        port_lmc:               0x00
                        link_layer:             InfiniBand
# mlxcables
Querying Cables ....

Cable #1:
---------
Cable name    : 41:00.0_cable_0
>> No FW data to show
-------- Cable EEPROM --------
Identifier                     : QSFP+ (0dh)
Technology                     : Copper cable unequalized (a0h)
Compliance                     : 40GBASE-CR4, 100GBASE-CR4, 25GBASE-CR CA-25G-L or 50GBASE-CR2 with RS (Clause91) FEC, EDR,FDR,QDR,DDR,SDR
Attenuation: 2.5GHz            : 4dB
             5.0GHz            : 6dB
             7.0GHz            : 8dB
             12.9GHz           : 12dB
             25.78GHz          : 0dB
OUI                            : 0x0002c9
Vendor                         : Mellanox        
Serial number                  : MT1952VS08208   
Part number                    : MCP1600-E003E26 
Revision                       : A2
Temperature [c]                : N/A
Digital Diagnostic Monitoring  : NO
Length [m]                     : 3 m

Best regards
Joe

Hi Joe,

When the device is in a polling state, this is an indication that there is no connection from this card to another card or switch.

Some general troubleshooting steps you can attempt would include

  1. Verify that the firmware is up to date and compatible with your switch FW by reviewing the release notes for each IB component

  2. Forcing the speed on the link and set Auto Negotiation to OFF:

mlxlink -d --link_mode_force --speeds

For example,

mlxlink -d /dev/mst/mt4123_pciconf0 --link_mode_force --speeds 200G

Once the speed is set, you must toggle the link so the above changes will take place:

mlxlink -d -a TG

If you are still encountering issues after the above steps, if you have a valid support contract please open a support ticket with us so it can be investigated in further detail.


Hello jasowu,

thank you, I’ve already tried but the speed wasn’t set, also I got the message “Setting new speed”.

Meanwhile I’ve found the solution.
With some debuging, looking at boot-messages, I was wondering about:

$ sudo mlxconfig -d /dev/mst/mt4125_pciconf0 query| grep IB_LINK

     KEEP_IB_LINK_UP_P1                          False(0)

And really:

$ sudo mlxconfig -d /dev/mst/mt4125_pciconf0 set KEEP_IB_LINK_UP_P1=1

Device #1:


Device type: ConnectX6DX

Name: MCX683105AN-HDA_Ax

Description: Nvidia ConnectX-6 DE InfiniBand adapter; HDR; single-port QSFP56; PCIe 4.0 x16; No Crypto

Device: /dev/mst/mt4125_pciconf0

Configurations: Next Boot New

     KEEP_IB_LINK_UP_P1                          False(0)        True(1)

Apply new Configuration? (y/n) [n] : y

Applying… Done!

-I- Please reboot machine to load new configurations.

$ sudo reboot

$ sudo ibstatus

Infiniband device ‘mlx5_0’ port 1 status:

    default gid:     fe80:0000:0000:0000:e8eb:d303:0010:492e

    base lid:        0xb

    sm lid:          0x6

    state:           4: ACTIVE

    phys state:      5: LinkUp

    rate:            100 Gb/sec (4X EDR)

    link_layer:      InfiniBand

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.