Mellanox Technologies MT2894 Family [ConnectX-6 Lx] - VF Packet Drops

Hello Folks,

I have strange issue in my environment

I have this card

4: enp69s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 58:a2:e1:28:38:a8 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 4 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 5 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 6 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 7 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
altname enp69s0f0np0

vf 0 is used by a POD, where the POD uses DPDK to establish

BFD Peerless with a Gateway.

The Problem is at time the BFD Echo is Lost/ and not seen back returned from DCGW.

Application => POD => BFD Echo Send=> Physical Compute => Network => DCGW

DCGW => Returns the Packets ( or Loops it ) => Network => Physical Compute => POD

I could see the BFD Echo returned from the DCGW from Traces in the Network Port connecting the Physical Compute where the SR-IOV is connected.

However the POD using the VF 0 does not see the BFD Echo,

It look like it is Dropped in the Compute

The SR-IOV Node is configured in eSwitch Legacy Mode

I am looking for ways to uses mlxtrace to check the FDB or is packets dropped by the NIC.

Or Mirroring of VF function at the NIC Level to that I can see the Traffic in another VF

For testing/ because I do not have access to Mirror from POD/ or use DPDK

In short I suspect the packet is dropped in the NIC, and I debugging capability to understand more

Like mlxtrace/ or Mirror the VF

mst status

MST modules:

MST PCI module is not loaded
MST PCI configuration module loaded

MST devices:

/dev/mst/mt4127_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:45:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
/dev/mst/mt4127_pciconf1 - PCI configuration cycles access.
domain:bus:dev.fn=0000:85:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
/dev/mst/mt4127_pciconf2 - PCI configuration cycles access.
domain:bus:dev.fn=0000:86:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00

Querying Mellanox devices firmware …

Device #1:

----------

Device Type: ConnectX6LX

Part Number: 06XJXK_0R5WK9_Ax

Description: NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter

PSID: DEL0000000031

PCI Device Name: 0000:86:00.0

Base GUID: 58a2e10300e35a5c

Base MAC: 58a2e1e35a5c

Versions: Current Available

FW 26.39.1002 N/A

PXE 3.7.0201 N/A

UEFI 14.32.0012 N/A

Status: No matching image found

Device #2:

----------

Device Type: ConnectX6LX

Part Number: 06XJXK_0R5WK9_Ax

Description: NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter

PSID: DEL0000000031

PCI Device Name: 0000:45:00.0

Base GUID: 58a2e103002838a8

Base MAC: 58a2e12838a8

Versions: Current Available

FW 26.39.1002 N/A

PXE 3.7.0201 N/A

UEFI 14.32.0012 N/A

Status: No matching image found

Device #3:

----------

Device Type: ConnectX6LX

Part Number: 06XJXK_0R5WK9_Ax

Description: NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter

PSID: DEL0000000031

PCI Device Name: 0000:85:00.0

Base GUID: 58a2e10300ef6514

Base MAC: 58a2e1ef6514

Versions: Current Available

FW 26.39.1002 N/A

PXE 3.7.0201 N/A

UEFI 14.32.0012 N/A

Status: No matching image found

Hi duraivelanc,

Thank you for posting your query on NVIDIA Community Forum.

Based on the information shared, the card in use is an OEM card(Dell branded) and hence any issues encountered should be first addressed with the OEM.

In general, to avoid any networking issues, please ensure you are using a supported OS/kernel/firmware/driver/switch/switch software/cable/transceiver. The support matrix is published in the Release Notes of the components in use.

Next, networking configuration which includes avoiding using multiple interfaces in the same IP subnet so you do not fall into ARP cache resolution issues. If the network configuration requires you to have IP’s in same subnet, ensure advanced routing is configured which falls under the OS vendor scope.

With regards to validating packet drop at NIC level, ethtool -S will provide the information. To check if packets are received at wire level, mlnx_perf -i can be used.

Prior to running traffic, ensure the system and NIC have been tuned for optimal performance —> EnterpriseSupport

Thanks,

NVEX Networking Technical Support Team.