Notice that only ConnectX-2 and ConnectX-3 are listed as supported under the 1.8 drivers. Will support be expanded for the ConnectX adapters?
Also noticed something weird when I switched from my ESXi 4.1 test install to a 5.1 install. After getting the drivers installed, it connected to the infiniband network at 4x, however, when I enabled the 4k mtu and restarted, it was setting itself to 1x. Things seem to be working now, but I haven’t run any tests yet to see what my data transfer performance is like yet.
Thanks, that’s exactly the info I was looking for.
Interesting. I had a chance to do some testing and I found that if I connect my 2 nodes direct, I can get an active_mtu of 4096, but if I go through my switch (Voltaire ISR9024D), it drops to active_mtu of 2048.
In addition, the default MTU from ifconfig ibo shows as 65520 and I get better speeds if I leave it be through iperf.
Indeed, looks like your opensm is not listening to the config file (the test you’ve done using saquery reflects this fact).
If you’re running opensm from a host, try to run it manually:
/usr/sbin/opensm -P /etc/opensm/partitions.conf
key0=0x7fff,ipoib,mtu=5 : ALL=full ;
Regarding the MTU, I think that is part of the Mellanox OFED driver. What size are you using? It looks like it can be set up to be as high as 4k. In the older releases, the cap was 1500 and everything above it was “silently dropped” when in IPoIB.
Another option that I have been meaning to try is to see if the 10GbE mode lets you set and user higher MTU.
I was just messaging someone else a second ago, one thing to check is the cables, some of the ones sold online say that they are rated for 10Gb, others say DDR. Not sure if that’s true, but I had no problems. I don’t think there’s anything special with my config but here are some of the results for my esx machine. I’ll keep thinking as well.
2 5 ==( 4X 5.0 Gbps Active/ LinkUp)==> 3 1 “esx2 mlx4_0” ( )
Ca 2 “H-0002c90300015230” # “esx2 mlx4_0”
1 “S-0008f104004128f4” # lid 3 lmc 0 “ISR9024D Voltaire” lid 2 4xDDR
i tried some older ConnectX1 HP Mezzanine cards. upgrade the firmware to 1.2 (from memory)
and use the older version on ESXi 5.0 driver
Download Drivers v1.8.0 for ESXi 5.X http://www.mellanox.com/downloads/Drivers/MLNX-OFED-ESX-126.96.36.199.zip (Builds: 469512 & 623860)
that driver didnt have SRP only ipoib, but i got IPOIB working on esxi build 623860, so it may work on 799733?
i also notice that we cant create virtual IP_OIB NICS either. FC#@#%^^&@#$@%!$#@$!&&*%^&%^
cheers n good luck!
I think I saw the 4k MTU in ESX when I was reading the docs for the MLNX OFED driver so I think it should still work on the 2.7 firmware. I’ll try to do some MTU testing this weekend as well on my setup as well but haven’t done any Windows server testing.
Thanks for the info! I was wondering if there might be something strange going on since I tried out ESXi 4.1 with relevant drivers and still had the same issue of the card being stuck at initialization. Will see if I can find some older firmware to try out.
Do you mean “Older” ConnectX (e.g. ConnectX version 1)?
If so, it will not happen. Only ConnectX-2 and newer.
The ESXi 5.1 vib package from Mellanox works fine with ConnectX cards. I am using it now for my datastore, sharing via SRP over IB DDR from a Solaris server.
From testing we have found that the HCAs firmware needs to be 2.7. 2.8 and 2.9 firmware on these cards seems to stop the card from being brought up on the IB network.
If you haven’t found it, the firmware are available on Mellanox’s site in the archives.
Mellanox - Support Download Assistant NVIDIA Networking Firmaware Downloads
Yes, thanks, I was able to flash a 2.7 version of the firmware for my hardware and now ESXi is connecting to the IB network. However, it’s connecting at single data rate instead of quad for some reason. I also am trying to figure out getting the adapter to allow jumbo packets on IPoIB, the MTU is getting capped at 1500 even with the setting listed on the user manual.
I’ll keep poking around. I don’t think it’s the cabling since it is on a blade server and its sister node is running fine under Windows.
Found the setting for jumbo frames on ESXi and that let me get to 2k MTU. Either it’s ignoring the opensm setting for 4k MTU or my opensm isn’t pulling the partition config properly when starting up. Guess I should look up if there is any documentation for the 2.7 firmware on 4k support.
Thanks for your input, moving forward we’ll probably be using more recommended hardware so hopefully that clears up some of the issues.
Thanks again. I’ll probably keep poking at it, I think the 4k issue probably has to do with opensm and how it is starting as a service under Windows. Might need to run opensm under a Linux system, but seems kind of a waste to have an IB system just to run the subnet manager.
No clue on the speeds though, from iblinkinfo on another box, it detects it can run at 1x or 4x, but chooses 1x for whatever reason.
Getting this to work fully with ESXi has been becoming less of a priority now since we’re going to get ConnectX-2 cards for the next order, and we’re not commited at all to VMware products. Been working fine with Windows and Hyper-V so least have that route.
I’ve run through all of the command and configuration recommendations in the user manuals, best I can get is 2044 MTU. Could possibly be opensm not listening to my partitions.conf file as when I run saquery on MCMemberRecord, the relevant fields are still showing mtu of 84 rather than 85 for 4k MTU.
Thanks for your suggestions, appreciate the community support.
Yes 4K mtu is supported in IPoIB for vSphere 5 as explained in the User Manual http://www.mellanox.com/related-docs/prod_software/Mellanox_IB_OFED_Driver_for_VMware_vSphere_User_Manual_Rev_1_8_1.pdf .
You basically need to configure the SM to enable 4K mtu for IPoIB (check your SM user manual, the configuration in the SM leve may vary depending on the SM provider, if you’re using opensm running on a host, then see page 20 in the driver User Manual http://www.mellanox.com/related-docs/prod_software/Mellanox_IB_OFED_Driver_for_VMware_vSphere_User_Manual_Rev_1_8_1.pdf ), then configure the vSwitch and the VM to use 4052 bytes mtu, check vSphere doucmentaion for more details, if you like to do that through the vSphere Client GUI, this guide http://blog.vmpros.nl/2011/07/14/vmware-edit-mtu-settings-by-gui-in-vsphere-5-0/ can take you through the required steps.
Note, while in Ethernet Jubmo Frame are up to 9000 bytes, IPoIB Jumbo Frames are 4052 bytes.
Looks like running it from command line and specifying the file did the trick, hadn’t used the -P argument previously. Not sure why it wasn’t reading correctly when running as a service in Windows, guess there are some quirks. Operating at 4092 MTU now, thanks for your help!