we have a problem on a FreeNAS iSCSI server (release is FreeNAS-11.2-U5).
We have a dual port Mellanox Infiniband ConnectX-3 card which is connected to a Infiniband switch (Grid Director 4036).
We have three Proxmox cluster nodes connected to the switch, which are running Proxmox VE 5.4-13.
The Infiniband cards on both ends are configured for connected mode with a MTU of 40950.
We are using a multipath setup with two subnets for IP over Infiniband.
This is working and we get a throughput of ~1-1.1 Gigabyte per second on each cluster node in parallel.
Sporadically we get a kernel trap on the FreeNAS server which is then rebooting.
This can happen from every hours up to 4 days.
The VMs are not crashing, they are in a delay until the FreeNAS server is online again.
Nevertheless we have to fix it.
The root cause is a packet over ipoib with a length >2044 bytes.
We are wondering where it comes from.
In a Linux Infiniband kernel documentation we found this:
In datagram mode, the IB UD (Unreliable Datagram) transport is used and so the interface MTU has is equal to the IB L2 MTU minus the IPoIB encapsulation header (4 bytes). For example, in a typical IB fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes. In connected mode, the IB RC (Reliable Connected) transport is used.
Connected mode takes advantage of the connected nature of the IB transport and allows an MTU up to the maximal IP packet size of 64K, which reduces the number of IP packets needed for handling large UDP datagrams, TCP segments, etc and increases the performance for large messages.
In connected mode, the interface’s UD QP is still used for multicast and communication with peers that don’t support connected mode. In this case, RX emulation of ICMP PMTU packets is used to cause the networking stack to use the smaller UD MTU for these neighbours.
Because of the overall performance we assume that we have multicast package problem here?
Has somebody any hint where we could look for the root cause?
Does anybody has an idea where the 4188 bytes packet come from?
Thank you in advance.