SX6036G Proxy-ARP Setup, (Telnet YES / Ping NO)

I’m kinda new here, so feel free to teach me. Masters

so here’s the situation

I’ve got a Network Setup with 10.10.100.0 / 16 with gateway of 10.10.255.1. Just like the picture above.

And also have configured Proxy-ARP following the documentaion of “Configuring Mellanox Hardware for VPI Operation Application Note Rev 1.2”. So I could use IPOIB for the servers using Infiniband cables (QSPF copper cable for connecting 6036 with 6036E or Servers and MAM1Q00A-QSA-SP

with SFP-10GB-SR with LC-SR Multi Mode cables to connect between catalyst 4948 and 6036)

When those were done, I went on and set-up the servers (Server01, Server02) with opensm and IP addresses, so it could do the job. But for some reason, the servers can’t access the network 10.10.100.0 / 16 AND more strange thing is,

that I could telnet 6036 from a computer connected to Catalyst 4948 or the network above, but SX6036G can’t ping out to the 10.10.100.0 / 16 network nodes or the gateway (10.10.255.1).

In a sentense, “I could telnet into 6036-01 but in the 6036-01 console, I can’t ping to anywhere”

For the Infiniband Networks, Nothing seems to be working. (No pings, no telnet anything or anywhere). Still, I get a green light on the port of 6036G and 4036E though.

I’m thinking about three cases which is

  1. Wrong Setup with Proxy-ARP → When testing with same setup without 4036E(so just direct connection between 6036 and servers only, like server01) and using 2960S not catalyst 4948, Proxy-ARP happily did it’s job. Was able to ping 8.8.8.8, gateway, and other server nodes, and if I added “ip route 10.10.100.0 /16 <Other 6036G address>” I was even able to telnet other server under other 6036G. Can’t think it is setup issue, but in this situation everything is pointing this to be the reason
  2. 10GB or MTU problem → Previous test was done on 2960s with 1GBE connection only. so maybe using new transciever may have triggered a new trouble
  3. ARP loop? → 4036E has a Ethernet port at the right side and an IO port on the left. I’ve got all of those port connected to 10.10.100.0 /16 to achieve server nodes on 4036E to reach 10.10.100.0 /16 network (Previous to this test, I was using 4036E as a main Infiniband Switch). That triggered my mind that those could be causing the vlan 10 and pke0x7fff to be bridged, causing the 6036 to malfunction. Seeing the Web interface on the 6036G I was able to notice Address-Resolution table keep adding entries, showing the ip address that I want

That’s about it. I’ve been hashing through this problem since… somewhere around March now. Appreciate all the help from you guys. Thank you in Advance and, for reading this whole question

6036-01 question.txt (3.15 KB)

Hi,

The topology is unusual. Typically the two 6036 VPI gateways would be configured in “HA” to load balance traffic. Also there would be at least 1 IB switch between these VPI gateways and the IB servers.

Then, you’d have server01 and server02 in the same cluster, managed by a single master SM. You’d then also have traffic load-balanced between the two VPI gateways.

Also unusual is a 4036E in the mix, which is a very olf “Voltaire” version of a VPI gateway. If you are ONLY using the IB ports on it (using it like an IB switch), its fine as long as the cables on it link UP.

This is the most common, simplistic topology:

https://community.mellanox.com/s/article/howto-configure-infiniband-gateway-ha–proxy-arp-x

I recommend reviewing that document.

I recommend disabling ib0, and configuring the management port Mgmt0 instead, as that port is isolated from the production traffic. It should be in a different management IP subnet.

Your IB server IPoIB interfaces should also be in the 10.10.x.x /16 subnet. If the router interface is also in the 10.10.x.x ip subnet, there is no need for a default gateway in the VPI gateway config to point to it.

Make sure you have physical link with show int eth 1/39 and show int ib 1/x.

If you are expecting server01 and server02 to be in the same IB cluster, you need the IB switch between them and the6036VPI gateways for full redundancy and communication and load balancing through the VPI gateways.

For HA mode on the gateways, you’ll need the mgmt0 interfaces UP, in an out-of-band management network, so that each VPI gateway can PONG each other’s mgmt0 interface (required for HA).

Eventually, you want to see both gateway PRA interfaces UP and Active, seen with “show interface proxy-arp 1 ha” when SSH’d to the proxy-arp “VIP” address. After that, you want to be able to PING between the IB servers and the Eth servers to confirm proper operation.