I’m kinda new here, so feel free to teach me. Masters
so here’s the situation
I’ve got a Network Setup with 10.10.100.0 / 16 with gateway of 10.10.255.1. Just like the picture above.And also have configured Proxy-ARP following the documentaion of “Configuring Mellanox Hardware for VPI Operation Application Note Rev 1.2”. So I could use IPOIB for the servers using Infiniband cables (QSPF copper cable for connecting 6036 with 6036E or Servers and MAM1Q00A-QSA-SP
with SFP-10GB-SR with LC-SR Multi Mode cables to connect between catalyst 4948 and 6036)
When those were done, I went on and set-up the servers (Server01, Server02) with opensm and IP addresses, so it could do the job. But for some reason, the servers can’t access the network 10.10.100.0 / 16 AND more strange thing is,
that I could telnet 6036 from a computer connected to Catalyst 4948 or the network above, but SX6036G can’t ping out to the 10.10.100.0 / 16 network nodes or the gateway (10.10.255.1).
In a sentense, “I could telnet into 6036-01 but in the 6036-01 console, I can’t ping to anywhere”
For the Infiniband Networks, Nothing seems to be working. (No pings, no telnet anything or anywhere). Still, I get a green light on the port of 6036G and 4036E though.
I’m thinking about three cases which is
- Wrong Setup with Proxy-ARP → When testing with same setup without 4036E(so just direct connection between 6036 and servers only, like server01) and using 2960S not catalyst 4948, Proxy-ARP happily did it’s job. Was able to ping 8.8.8.8, gateway, and other server nodes, and if I added “ip route 10.10.100.0 /16 <Other 6036G address>” I was even able to telnet other server under other 6036G. Can’t think it is setup issue, but in this situation everything is pointing this to be the reason
- 10GB or MTU problem → Previous test was done on 2960s with 1GBE connection only. so maybe using new transciever may have triggered a new trouble
- ARP loop? → 4036E has a Ethernet port at the right side and an IO port on the left. I’ve got all of those port connected to 10.10.100.0 /16 to achieve server nodes on 4036E to reach 10.10.100.0 /16 network (Previous to this test, I was using 4036E as a main Infiniband Switch). That triggered my mind that those could be causing the vlan 10 and pke0x7fff to be bridged, causing the 6036 to malfunction. Seeing the Web interface on the 6036G I was able to notice Address-Resolution table keep adding entries, showing the ip address that I want
That’s about it. I’ve been hashing through this problem since… somewhere around March now. Appreciate all the help from you guys. Thank you in Advance and, for reading this whole question
6036-01 question.txt (3.15 KB)