I would like to do ShareIO with ConnectX-5 MT27800 between two servers, but, the second server which has Auxiliary Card installed cannot be ping by other server.

Informations of he first server :

[root@node305 ~]# lspci | grep -i mellanox

06:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

06:00.1 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

[root@node305 ~]# ibstatus

Infiniband device ‘mlx5_0’ port 1 status:

default gid: fe80:0000:0000:0000:9803:9b03:0056:2e3a

base lid: 0x1ef

sm lid: 0xa7

state: 4: ACTIVE

phys state: 5: LinkUp

rate: 100 Gb/sec (4X EDR)

link_layer: InfiniBand

Infiniband device ‘mlx5_1’ port 1 status:

default gid: fe80:0000:0000:0000:9803:9b03:0056:2e3b

base lid: 0xffff

sm lid: 0x0

state: 1: DOWN

phys state: 3: Disabled

rate: 10 Gb/sec (4X)

link_layer: InfiniBand

[root@node305 ~]# mlxconfig -d /dev/mst/mt4119_pciconf0 q

Device #1:


Device type: ConnectX5

Name: STA7A37060_Ax

Description: Mellanox ConnectX-5 Shared IO EDR IB/100GbE Adapter Kit

Device: /dev/mst/mt4119_pciconf0

Informations of he second server :

lspci | grep -i mella

06:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

06:00.1 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

ibstatus

Infiniband device ‘mlx5_0’ port 1 status:

default gid: fe80:0000:0000:0000:9803:9b03:0056:2e3e

base lid: 0xffff

sm lid: 0xa7

state: 1: DOWN

phys state: 5: LinkUp

rate: 100 Gb/sec (4X EDR)

link_layer: InfiniBand

Infiniband device ‘mlx5_1’ port 1 status:

default gid: fe80:0000:0000:0000:9803:9b03:0056:2e3f

base lid: 0xffff

sm lid: 0x0

state: 1: DOWN

phys state: 3: Disabled

rate: 10 Gb/sec (4X)

link_layer: InfiniBand

[root@node307 ~]# mlxconfig -d /dev/mst/mt4119_pciconf0 q

Device #1:


Device type: ConnectX5

Name: STA7A37060_Ax

Description: Mellanox ConnectX-5 Shared IO EDR IB/100GbE Adapter Kit

Device: /dev/mst/mt4119_pciconf0

It may miss any parameters to activate the ShareIO.

Can you help please ?

Thank you!

Hi Giang,

It would be great if you could provide the following information:

  1. What is the PSID and firmware version of the card? You can run the following commands to fetch the information. Please provide entire output:

#mst start

#mst status (To get MST device name)

#flint -d q

  1. What is the P/N of the cable?
  2. Have you tried connecting it to other switch port? (It looks like link itself is not up)
  3. Please provide the output of : #mlxlink -d -emc
  4. Please provide the output of “#ip addr” from both servers.

Thanks,

Namrata.

Hi Namrata,

Thank you for your answer.

Please find below informations needed :

[root@node305 ~]# flint -d /dev/mst/mt4119_pciconf0 q

Image type: FS4

FW Version: 16.25.6000

FW Release Date: 26.6.2019

Product Version: 16.25.6000

Rom Info: type=UEFI version=14.18.20 cpu=AMD64

type=PXE version=3.5.702 cpu=AMD64

Description: UID GuidsNumber

Base GUID: 98039b0300562e3a 12

Base MAC: 98039b562e3a 12

Image VSD: N/A

Device VSD: N/A

PSID: LNV0000000012

Security Attributes: secure-fw


[root@node307 ~]# flint -d /dev/mst/mt4119_pciconf0 q

Image type: FS4

FW Version: 16.25.6000

FW Release Date: 26.6.2019

Product Version: 16.25.6000

Rom Info: type=UEFI version=14.18.20 cpu=AMD64

type=PXE version=3.5.702 cpu=AMD64

Description: UID GuidsNumber

Base GUID: 98039b0300562e3a 12

Base MAC: 98039b562e3a 12

Image VSD: N/A

Device VSD: N/A

PSID: LNV0000000012

Security Attributes: secure-fw


continue…

[root@node305 ~]# mlxlink -d /dev/mst/mt4119_pciconf0 -emc

Operational Info


State : Active

Physical state : LinkUp

Speed : IB-EDR

Width : 4x

FEC : Standard LL RS-FEC - RS(271,257)

Loopback Mode : No Loopback

Auto Negotiation : ON

Supported Info


Enabled Link Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)

Supported Cable Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)

Troubleshooting Info


Status Opcode : 0

Group Opcode : N/A

Recommendation : No issue was observed.

Physical Counters and BER Info


Time Since Last Clear [Min] : 25.6

Effective Physical Errors : 0

Raw Physical Errors Per Lane : 0,0,0,0

Effective Physical BER : 15E-255

Raw Physical BER : 15E-255

EYE Opening Info


Physical Grade : 39730, 43882, 36409, 37563

Height Eye Opening [mV] : 128, 131, 130, 126

Phase Eye Opening [psec] : 12, 13, 12, 12

Module Info


Identifier : QSFP+

Compliance : N/A

Cable Technology : Copper cable unequalized

Cable Type : Passive copper cable

OUI : Mellanox

Vendor Name : Mellanox

Vendor Part Number : 00MP561

Vendor Serial Number : 2P56196LZ52

Rev : B1

Attenuation (5g,7g,12g) [dB] : 6,8,12

FW Version : N/A

Wavelength [nm] : N/A

Transfer Distance [m] : 3

Digital Diagnostic Monitoring : No

Power Class : 1.5 W max

CDR RX : OFF,OFF,OFF,OFF

CDR TX : OFF,OFF,OFF,OFF

LOS Alarm : N/A

Temperature [C] : N/A

Voltage [mV] : N/A

Bias Current [mA] : N/A

Rx Power Current [dBm] : N/A

Tx Power Current [dBm] : N/A


[root@node307 ~]# mlxlink -d /dev/mst/mt4119_pciconf0 -emc

Operational Info


State : Active

Physical state : LinkUp

Speed : IB-EDR

Width : 4x

FEC : Standard LL RS-FEC - RS(271,257)

Loopback Mode : No Loopback

Auto Negotiation : ON

Supported Info


Enabled Link Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)

Supported Cable Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)

Troubleshooting Info


Status Opcode : 0

Group Opcode : N/A

Recommendation : No issue was observed.

Physical Counters and BER Info


Time Since Last Clear [Min] : 26.8

Effective Physical Errors : 0

Raw Physical Errors Per Lane : 0,0,0,0

Effective Physical BER : 15E-255

Raw Physical BER : 15E-255

EYE Opening Info


Physical Grade : 39730, 43882, 36409, 37563

Height Eye Opening [mV] : 128, 131, 130, 126

Phase Eye Opening [psec] : 12, 13, 12, 12

Module Info


Identifier : QSFP+

Compliance : N/A

Cable Technology : Copper cable unequalized

Cable Type : Passive copper cable

OUI : Mellanox

Vendor Name : Mellanox

Vendor Part Number : 00MP561

Vendor Serial Number : 2P56196LZ52

Rev : B1

Attenuation (5g,7g,12g) [dB] : 6,8,12

FW Version : N/A

Wavelength [nm] : N/A

Transfer Distance [m] : 3

Digital Diagnostic Monitoring : No

Power Class : 1.5 W max

CDR RX : OFF,OFF,OFF,OFF

CDR TX : OFF,OFF,OFF,OFF

LOS Alarm : N/A

Temperature [C] : N/A

Voltage [mV] : N/A

Bias Current [mA] : N/A

Rx Power Current [dBm] : N/A

Tx Power Current [dBm] : N/A

continue…

[root@node305 ~]# ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

link/ether 08:94:ef:90:aa:da brd ff:ff:ff:ff:ff:ff

inet 10.120.37.41/22 brd 10.120.39.255 scope global eth0

valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

link/ether 08:94:ef:90:aa:db brd ff:ff:ff:ff:ff:ff

4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256

link/infiniband 20:00:10:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3a brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

inet 10.135.37.41/22 brd 10.135.39.255 scope global ib0

valid_lft forever preferred_lft forever

5: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256

link/infiniband 20:00:18:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3b brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

6: eth0.2705@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

link/ether 08:94:ef:90:aa:da brd ff:ff:ff:ff:ff:ff

inet 10.120.41.41/22 brd 10.120.43.255 scope global eth0.2705

valid_lft forever preferred_lft forever


[root@node307 ~]# ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

link/ether 08:94:ef:90:a4:de brd ff:ff:ff:ff:ff:ff

inet 10.120.37.43/22 brd 10.120.39.255 scope global eth0

valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

link/ether 08:94:ef:90:a4:df brd ff:ff:ff:ff:ff:ff

4: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256

link/infiniband 20:00:14:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3e brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

inet 10.135.37.43/22 brd 10.135.39.255 scope global ib0

valid_lft forever preferred_lft forever

5: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 256

link/infiniband 20:00:1c:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3f brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

6: eth0.2705@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

link/ether 08:94:ef:90:a4:de brd ff:ff:ff:ff:ff:ff

inet 10.120.41.43/22 brd 10.120.43.255 scope global eth0.2705

valid_lft forever preferred_lft forever

  1. The switch port seems be ok, ping from another server to the primary Mellanox Card works which is connecte to node305.

But the same ping to the Auxiliary Card which is connecte to node307 does not work.

Thanks,

Giang

Hi Giang,

Based on the output of “ip addr” from node 307, the status of ib0 shows it is down:

4: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256

link/infiniband 20:00:14:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3e brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

inet 10.135.37.43/22 brd 10.135.39.255 scope global ib0

valid_lft forever preferred_lft forever

whereas from node 305, it is UP:

4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256

link/infiniband 20:00:10:86:fe:80:00:00:00:00:00:00:98:03:9b:03:00:56:2e:3a brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

inet 10.135.37.41/22 brd 10.135.39.255 scope global ib0

valid_lft forever preferred_lft forever

Are you using Mellanox Driver(MLNX OFED)? You can verify by running "#[Ofed_info -s]​ "

In addition, I would like to clarify that based on the output provided for the “flint” command, the cards are Lenovo branded(PSID LNV0000000012). Thus, when OEM(in this case Lenovo) cards are involved, our general policy is that end customer needs to open a case with OEM and if required OEM will open a case with us. We do not communicate with end customer directly when OEM cards are involved.

Thanks,

Namrata.