According to the documentation (https://docs.nvidia.com/networking/pages/viewpage.action?pageId=25155266), I quote “Enhanced IPoIB feature enables offloading ULP basic capabilities to a lower vendor specific driver, in order to optimize IPoIB data path”.
The first thing I’d like to ask is why I am stuck on MTU 2044 when switched in enhanced mode (and then the ibX mode switch automatically to datagram), while the documentation talks about 4k MTU (but only defining it in a “partitions”… I do not even have opensm set-up to define a partition; subnet manager is running on fabric switch). Of course all ports of my IB Switch have MTU 4k.
The second question is why I am getting poorer CPU-System performances (I mean higher CPU system) and lower bandwidth in datagram/enhanced, compared to connected/not-enhanced (I could guess the low BW is caused by low MTU… but I have not much experience on this topic; trying to learning right now).
My HW is Nvidia ConnectX-5 on one node, ConnectX-6 on the other node, OFED 5.8-18.104.22.168, RH7.9, 3.10.0-1160.49.1 kernel, FDR cables, EDR Nvidia fabric switch.
To check that I am in enhanced mode I do:
[root@sf-daq-8 ~]# cat /sys/class/net/ib0/mode
datagram [root@sf-daq-8 ~]# cat /etc/modprobe.d/ib_ipoib.conf alias netdev-ib* ib_ipoib options ib_ipoib ipoib_enhanced=1