SX6036 proxy-arp mtu and roce/ib pass-through

Shalom!

I’ve got this funny 6036G switch and I’m trying to setup a gateway between ipoib and ethernet hosts. I’m a bit perplexed on what mtu should be picked. When going through KB articles and MLNX-OS guide, I’m seeing contradicting info. First of all, proxy-arp switch interface can have max 4092 mtu. That’s totally fine with me (have decided to use 4090), but here comes the hard part. First of all, manual mentions the following command: interface ib 1/10 mtu 4000, which can’t be executed on the switch because IB interfaces can only have fixed mtus - 1k, 2k, 4k. Secondly, ethernet interface on the host won’t select active_mtu of 4096 unless you set a very high mtu on the interface – it’s currently set to 4200 over here. But on the switch side mtu for ethernet port is set to 4090. When I set mtu on the host with **ip li set enp5s0 mtu 4090**, it downgrades **ibv_devinfo |grep active_mtu** to 2048.

As you see, I’m completely confused. What mtu should I actually set so that the entire thing works as it should? Obviously, I’m aiming at highest possible mtu with this switch as I’d like to push large I/O over it.

My current mtu mess is as following:

4200 (linux) - 4096 (driver/active_mtu) HCA ~~~ 4090 (EN port) switch - 4090 (switch proxy-arp interface) - 4096 (IB port) switch ~~~ HCA 4096 (driver/active_mtu) - ipoib 4090 (linux)

Second question is about the gateway functionality that I’m not sure I completely understand. We’ve got some stuff running under Infiniband over here which I’m trying to gradually move over ROCE, but I’m a bit lost as nowhere it says switch gateway is able to “commutate” RDMA traffic over gateway/proxy interface. Do I understand correctly that RC won’t be passed through proxy-arp and that only plain dumb IP traffic is able to traverse it?

Currently, ib_write_bw , rping tools are outputting funny messages when run between ipoib and roce hosts.

Thanks!

Hi,

The below article has examples of how to change the MTU to 4K on the switch.

https://community.mellanox.com/s/article/howto-configure-infiniband-gateway-ha--proxy-arp-x

To change the mtu on the host use “ifconfig” instead of ip li set

ifconfig enp5s0 mtu 4096

What is funny messages when running ib_write_bw?

the SX6036G doesn’t support transporting RDMA from IB<->Eth.

only IPoIB <-> IPoEth is supported

This is incorrect. They’re using the same kernel calls and ifconfig has been deprecated for ages.

ip li sh dev eth5 |grep mtu

23: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9200 qdisc mqprio state UP mode DEFAULT group default qlen 1000

ibv_devinfo |grep mtu

max_mtu: 4096 (5)

active_mtu: 4096 (5)

max_mtu: 4096 (5)

active_mtu: 4096 (5)

ifconfig eth5 mtu 4096

ibv_devinfo |grep mtu

max_mtu: 4096 (5)

active_mtu: 2048 (4)

max_mtu: 4096 (5)

active_mtu: 4096 (5)

ip li sh dev eth5 |grep mtu

23: eth5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4096 qdisc mqprio state DOWN mode DEFAULT group default qlen 1000

As you see, active_mtu has been lowered even when though ifconfig was used.

Thanks. Exactly what I presumed.