What to do for Back-to-Back ConnectX-4 LX? Link-State is UP but no IP-Communication on Windows Server 2019.

Hi,

I try to get a 2-node switchless configuration with two ConnectX-4 LX-Cards. I have a media link but have no IP communication. I set static IP dresses in different subnets like this:

Host A Port 1 192.168.0.1 <—> Host B Port 2 192.168.0.2

Host A Port 1 192.168.1.1 <—> Host B Port 2 192.168.1.2

I cannot ping the other side and I didn’t get any arp entries. I also tried with static arp entries without luck.

I’m using a supported DAC-Cable by Mellanox as shown below:

C:\Users\administrator.HYPERVISOR>mlxcables

Cable #1:


Cable name : mt4117_pciconf0_cable_0

No FW data to show

-------- Cable EEPROM --------

Identifier : SFP/SFP+/SFP28 (03h)

Technology : Passive Copper Cable

Compliance : 40GBASE-SR4, 25GBASE-CR CA-N

OUI : 0x0002c9

Vendor : Mellanox

Serial number : MT1916VS05172

Part number : MCP2M00-A001E30N

Revision : A4

Temperature : N/A

Length : 1 m

Cable #2:


Cable name : mt4117_pciconf0_cable_1

No FW data to show

-------- Cable EEPROM --------

Identifier : SFP/SFP+/SFP28 (03h)

Technology : Passive Copper Cable

Compliance : 40GBASE-SR4, 25GBASE-CR CA-N

OUI : 0x0002c9

Vendor : Mellanox

Serial number : MT1916VS05172

Part number : MCP2M00-A001E30N

Revision : A4

Temperature : N/A

Length : 1 m

C:\Users\administrator.HYPERVISOR>mlx5cmd -stat

NIC 1:

physical_location=Bus 99, Device 0, Function 1

state=ENABLED

uplink:

BUS=PCI-E Gen3

SPEED=8.0 GT/s

WIDTH=x8

CAPS=8.0*x8

MaxPayloadSize=512 Bytes

MaxReadReqSize=512 Bytes

vendor_id=0x15b3

vendor_part_id=4117

hw_ver=0x0

fw_ver=14.27.1016

driver_ver=2.40.22511.0

PSID=MT_2420110034

system_image_guid=b859:9f03:00af:c738

Adapter 1:

name=mlnx2

interface_description=Mellanox ConnectX-4 Lx Ethernet Adapter #2

ndis_mode=RSS

physical_location:

Bus=99

Device=0

Function=0

state=ENABLED

port_guid=b859:9f03:00af:c738

node_guid=b859:9f03:00af:c738

port_state=PORT_UP

port_phys_state=LINK_UP

DevX=False

DevxFsRules=0x0000

link_speed=25.00 Gbps

active_mtu=1500

default_roce_version=2.0

roce_mtu=1024

link_layer=Ethernet

RDMA GIDs table:

GID[0]:

GID=fe80:0000:0000:0000:ba59:9fff:feaf:c738

RoCE_version=1.0

vlan=no vlan

GID[1]:

GID=fe80:0000:0000:0000:ba59:9fff:feaf:c738

RoCE_version=1.0

vlan=0

GID[2]:

GID=0000:0000:0000:0000:0000:ffff:c0a8:0201

RoCE_version=2.0

vlan=no vlan

GID[3]:

GID=0000:0000:0000:0000:0000:ffff:c0a8:0201

RoCE_version=2.0

vlan=0

GID[4]:

GID=fe80:0000:0000:0000:6187:a72c:db3e:2b1b

RoCE_version=2.0

vlan=no vlan

GID[5]:

GID=fe80:0000:0000:0000:6187:a72c:db3e:2b1b

RoCE_version=2.0

vlan=0

Adapter 2:

name=mlnx1

interface_description=Mellanox ConnectX-4 Lx Ethernet Adapter

ndis_mode=RSS

physical_location:

Bus=99

Device=0

Function=1

state=ENABLED

port_guid=b859:9f03:00af:c739

node_guid=b859:9f03:00af:c739

port_state=PORT_UP

port_phys_state=LINK_UP

DevX=False

DevxFsRules=0x0000

link_speed=25.00 Gbps

active_mtu=1500

default_roce_version=2.0

roce_mtu=1024

link_layer=Ethernet

RDMA GIDs table:

GID[0]:

GID=fe80:0000:0000:0000:ba59:9fff:feaf:c739

RoCE_version=1.0

vlan=no vlan

GID[1]:

GID=fe80:0000:0000:0000:ba59:9fff:feaf:c739

RoCE_version=1.0

vlan=0

GID[2]:

GID=0000:0000:0000:0000:0000:ffff:c0a8:0101

RoCE_version=2.0

vlan=no vlan

GID[3]:

GID=0000:0000:0000:0000:0000:ffff:c0a8:0101

RoCE_version=2.0

vlan=0

GID[4]:

GID=fe80:0000:0000:0000:6dfd:df12:1a9d:b936

RoCE_version=2.0

vlan=no vlan

GID[5]:

GID=fe80:0000:0000:0000:6dfd:df12:1a9d:b936

RoCE_version=2.0

vlan=0

Best regards

Yosh

Here is the output from mlxconfig

i have a mistake in my explanation, here is the correction:

Host A Port 1 192.168.0.1 <—> Host B Port 2 192.168.0.2

Host A Port 2 192.168.1.1 <—> Host B Port 1 192.168.1.2

From the mlxconfig prints all looks good. cables are fine, and links are UP

Generally speaking - it seems like you have an improper basic network-configuration issue

I do see that you have Hyper-Vs, each using SR-IVO with 8 VF

As I don’t have the whole image of your system, I assume you probably created vSwitches & assigned them to the VMs on the top of the Hyper-Vs

If this is the case then go to “Network-connection-details” of the ConnectX-4 “bare-metal” interfaces you’ll see you have NO IP there as it is “enslaved” by vSwitches

so you probably try to ping to “enslaved” IPs…

I also assume you have set IP mask address to 255.255.0.0 for all IPs

If ping is not working then I suggest:

  1. “dismount” & delete all the vswitches and go back to “Bare-metal” network level only of the Connect-X 4 adapters
  2. Ensure now in “Network-connection-details” that each ConnectX-4 interface has a proper IP & X.X.X.X/16 addr
  3. ping to see if this sorted out the problem