Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

Thank you for your reply.

I experienced infiniband iSER, SRP on OmniOS. Both protocol almost same performance and latency. But Mellanox don’t support on vSphere 6 and above.

I did downgrade to ESXi 5.5u3 latest patch and vCSA6u2 latest today.

I’ll wait to launch vSphere 6.5 then test ESXi 6.5 PVRDMA.

If this new ESXi 6.5 PVRDMA that co-work Mellanox and VMWare support network only, I’ll change major storage protocol.

Have a nice day!

Jaehoon Choi

OK, 4 moths later, ESXi 6.5 released, and still no reply from Mellanox. As expected, previously working workarounds (uninstalling inbox drivers and installing 1.8.x.x) for SRP/iSER over IPoIB no longer work. I even attempted using 1.9.10.5 ETH drivers with my Connect-X3s and employing iSER, but, still, this didn’t work either. Which means I have to stick to ESXi 6.0, possibly, until I decide to EoL my Mellanox hardware altogether.

Mellanox: what’s your response to all this? Will there ever be a solution for RDMA storage for VMware, or you just decided it’s not worth the effort?

ZFS on Linux seems to be very stable for me.

Bear in mind that with LIO you will be able to use SRP only with CentOS inbox drivers. If you choose to install Mellanox OFED, you will have access only to iSER over LIO. The only way to have SRP with Mellanox OFED is using SCST.

I actually conducted some tests comparing VMware storage performance of SRP over LIO/Inbox, iSER over LIO/Inbox, iSER over LIO/OFED, SRP over SCST/OFED and iSER over SCST/OFED. In all tests SCST/OFED performed consistently better than ILO, with SRP over SCST/OFED being the best latency-wise and iSER over SCST/OFED bandwidth-wise.

Hi!

ESXi 6.5 include RDMA support, but there is several bugs.

ESXi hypervisor embedded RDMA support that coworks VMware and Mellanox, but there are many advance will be required.

I think almost 2~3 years later they will support iSER and vRDMA properly.

Therefore ConnectX3,4,5,6 will be deprecated then new generation HCA will support it.

Mellanox always said that their product support everything, but only Linux support.

Jaehoon Choi

Have any of you tried the “esxcli rdma iser add” command? I tried it on one host, it didn’t generate any output but this was logged in dmesg log:

2017-01-13T08:05:35.347Z cpu17:67870 opID=3f4390b7)World: 12230: VC opID esxcli-f2-cc96 maps to vmkernel opID 3f4390b7

2017-01-13T08:05:35.347Z cpu17:67870 opID=3f4390b7)Device: 1320: Registered device: 0x43044f391040 logical#vmkernel#com.vmware.iser0 com.vmware.iser (parent=0x1f4943044f3912b0)

I’m using native drivers on a clean esxi 6.5 install. ConnectX-3 10GbE nics (MT27520). My hope was that there are plans to add native iser support but might just be wishful thinking?

VMware vSphere 6.5 Documentation Library Online Documentation - vSphere Command-Line Interface Reference - VMware {code}

And in my usage scenario there is another part to this problem. I also need Windows connectivity to the same storage as ESXi.

Problem is with Server 2012 there is no SRP/iSer functionality at all because MS are focussing all their efforts onto SMB Direct (SMB3 over RDMA). (the problem with this is that it won’t talk to anything non MS which I guess is part of their plan also) There is however legacy SRP functionality for Server2008 so that solves the issue for the short term and also confirms the implementation of SRP is right solution for the right time.

There is however at the moment no iSer functionality for Server 2008 (and not likely to ever be because its too old) and none for Server 2012 (because MS haven’t done it and are not interested in doing it) - BUT (and its a big but) I believe there is work on an iSer client for Windows 2012 by a third party but I have no idea how its going.

Hi,

I’m sorry you understand it this way, Mellanox is actually dedicated very much to ESXi, and working closely with vmware on a day to day basis. The new native driver has limitations compared to the vmklinux driver, it simply does not allow us to support infrastructure that we could before such as IB and iSER. Vmware are aware of the drawbacks of the move from vmklinux, and we are working with the limitations at hand. As stated above we are trying to bypass the limitations currently set by offering SR-IOV solutions, while discussing with vmware future support for IB and iSER. I hope to bring news in the near future.

Erez.

Hi!

I’m also have many questions about support iSER in ESXi environment.

Current RDMA support on ESXi is a half on VM guest OS - JUST SUPPORT LINUX GUEST ONLY - and some vRDMA functions conflict against Physical RDMA Hardware functions, some driver’s name space and some bug is critical.

I think also Mellanox and VMware focus on RDMA VM network for VM guest (vRDMA), not a RDMA Storage protocol now…

There is a difference between ESXi and KVM.

This difference cause a problem driver and storage protocol porting to ESXi.

ESXi hypervisor embedded drivers for VM guest, but native RDMA, vRDMA for VM guest now.

I’m also wait very long time that Mellanox support STABLE RDMA Storage protocol and SRIOV on ESXi environment.

But not now.

If you use vSphere OFED 1.8.2.4, 1.8.2.5, 1.8.3.0 on ESXi 6.0 with Solaris

COMSTAR SRP, iSER target then you will meet the ESXi PSOD.

ESXi 6.0 support Linux target only.

Jahoon Choi

2017년 1월 30일 (월) 19:03, jasonc <community@mellanox.com mailto:community@mellanox.com >님이 작성:

Mellanox Interconnect Community

< >

Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

Jason Cecchin

< >

marked mpogr

< >'s

reply on Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

< > as

helpful. View the full reply

< >

Correct!

I think Mellanox waiting for EOL of CX-3…

  • Not CX-3 Pro

CX-3 don’t have a capability VXLAN offload support.

In the near future, RoCE v2 will be a dominant in Ethernet iSER therefore Mellanox drop to support for CX-3.

Another issue is don’t support Solaris COMSTAR for ZFS.

Mellanox vSphere OFED for ESXi 5.x support Solaris COMSTAR properly in historically with Infiniband.

But now, they don’t support Solaris COMSTAR on ESXi 6.0 or above.

I don’t know what’s the problem caused from …

I think we have every right to ask for clarifications from Mellanox at this stage. ophirmaor Infrastructure & Networking - NVIDIA Developer Forums ?

Hi Erez,

If I understand your replies correctly, you’re basically saying this: we (Mellanox) used to provide ESXi drivers using certain driver model (vmklinux) that VMware no longer support, hence we’re not going to support any of our hardware functionality other than plain Ethernet.

The problem here is that VMware have been promoting their new driver architecture for quite some time and allowed coexistence of the new and old driver models for a while, so vendors like Mellanox would be able to develop new drivers using the new architecture. However, it looks like Mellanox chose not to bother with implementing the full driver capabilities into the new model, which kind of shows lack of commitment to the ESXi platform. For some reason, this hasn’t prompted a massive revision of Mellanox marketing and promotional materials that still clearly state full ESXi support, failing to mention that some of the very significant features of its product lines (e.g. IB support altogether) are no longer supported on ESXi. I call this misleading marketing/borderline false advertising!

Please, don’t try to hide behind SR-IOV! We bought Mellanox products to provide fast storage for ESXi datastores, so SR-IOV doesn’t help us at all!

Any further clarifications?

Hi!

I think all problem originated from VMware’s native driver model.

New native driver for AHCI is buggy then almost users were disable native driver then reboot ESXi host then use vKernel driver.

Then I think disable all native driver form Mellanox HCA + uninstall inbox driver then install Mellanox OFED 1.8.2.5 on my ESXi 6.0 host.

I’m test Mellanox SRP driver 1.8.2.5 on ESXi 6.0 like below.

A. Configurations

All HCA were MHQH29B-XTR with firmware 2.9.1200

  • If test is success then switch to all HCAs to ConnectX-3 MCX354A-FCBT

2 of SX6036G FDR14 Gateway switches

  • Configure diffrent SM Prefix on each switches for SRP path optimization

OmniOS with latest update ZFS COMSTAR SRP Target

B. Installation

  • disable native driver for vRDMA - this is very buggy

esxcli system module set --enabled=false -m=nrdma

esxcli system module set --enabled=false -m=nrdma_vmkapi_shim

esxcli system module set --enabled=false -m=nmlx4_rdma

esxcli system module set --enabled=false -m=vmkapi_v2_3_0_0_rdma_shim

esxcli system module set --enabled=false -m=vrdma

  • uninstall inbox driver - also useless function that can’t support ethernet iSER properly

esxcli software vib remove -n net-mlx4-en

esxcli software vib remove -n net-mlx4-core

esxcli software vib remove -n nmlx4-rdma

esxcli software vib remove -n nmlx4-en

esxcli software vib remove -n nmlx4-core

esxcli software vib remove -n nmlx5-core

  • install Mellanox OFED 1.8.2.5 for ESXi 6.x.

esxcli software vib install -d /var/log/vmware/MLNX-OFED-ESX-1.8.2.5-10EM-600.0.0.2494585.zip

  • enable SRP RDM filter

  • register VMware NMP path optimization rule

- This is optional. I’m use now ESOS linux based SRP Target that can support all of VAAI features…:)

esxcli storage nmp satp rule add -s “VMW_SATP_ALUA” -V “SUN” -M “COMSTAR” -P “VMW_PSP_RR” -O “iops=1” -e “Sun ZFS COMSTAR” -c “tpgs_on”

esxcli storage core claimrule load

then reboot the ESXi 6.0 host

All ESXi 6.0 host works properly 3 days.

Some weeks later I’ll report it on this threads…:)

P.S

If all ESXi 6.0 works properly for 1 weeks then I’ll switched all host to ESXi 6.5 for another test…:)

PS2 (Updated in 20 June, 2017)

All of ESXi 6.0, 6.5 test works perfectly…:)

But on ESXi 6.5 P01 has a problem with In-Guest SCSI unmap that cause VMDK performance Problems, not SRP Target.

I’m work in ESXi 6.0 latest build with ESOS SCST SRP Target that show me a perfect performance and rock solid!

All Problem was VMware’s inbox native vRDMA driver that cause ESXi 6.x PSOD conflict with SRP driver 1.8.2.5.

Be stand!

I think VMware launch new ESXi 6.5 update 1 on July, 2017.

I’m waiting for IPoIB Infiniband iSER native driver…:)

But I’m afraid that there is a possibility that VMware & Mellanox discontinue support ConnectX-3 HCA…:(

ESXi 6.5 inbox driver can support Ethernet iSER on ConnectX-4 only now.

If ConnectX-3 will discontinue support iSER that IPoIB or Ethernet, l’ll change CX-3 operation mode from FDR Infiniband to 40Gb Ethernet mode then change lab storage from ESOS to ScaleIO.next…:)

I would like to clarify the ESXi over InfiniBand (IB) support topic and sorry for being late to the thread.

IB Para-virtualization (PV) and SR-IOV are supported only in ESXi versions that support the VMKLinux driver, which means in version ESXi 6.0 or older. In those versions, the IPoIB standard protocol has been implemented. In ESXi 6.5 (that

supports only native driver), Mellanox plans to add IB over SR-IOV support in June’17.

Re. ESXi’s Storage protocols:

  • SRP runs only over IB and the latest driver that included new features was 1.8.2.3 over ESXi 5.5 using ConnectX-2, ConnectX-3 & ConnectX-3 Pro. Since then, the SRP driver is in a maintenance mode (means, Mellanox will only fix issues).

  • iSER support comes only over RoCE and ESXi 6.5 includes inbox Mellanox’s TCP/IP & RoCE drivers over all speeds, 10, 25, 40, 50 & 100 Gb/s and run over ConnectX-4 Lx and ConnectX-4. ConnectX-5 support will be added later this year.

its now CLEAR Mellanox are doing nothing about SRP

WE JUST DUMPED our 56Gb/s VPI switches. and replaced with arista 2 x 7050QX and 2 x Dell s4810 switches.

We dumped SCST/SRP in favor of VSAN on ESX 6.5 with all the new features we couldnt use becuase of no driver support in esx 6.5

chucked in nVME pro 860 500Gb cache, 8 x 1TB X 3 vSAN’s Sansung Enterprise 3d SSD,s and Intel 40GB Nics on VSAN x 3 nodes with 8TB of SSD each, and its better than SRP over 56Gb/s SCST Targt… YES BETTER, faster easier to manage. and sooo much esier to deploy. SRP Infiniband IS DEAD!

SRP IS DEAD THROW OUT YOUR VDI SWITCHES ADN GO 40Gb+ Ethernet…

Absolutely!

Mellanox OFED 1.8.2.x can only support CX-3 or below.

Unfortunately Mellanox doesn’t release SRP driver for ConnectX-4 VPI HCA or above before.

I prey this recipe will be using by community users or some admins before when Mellanox release a new IPoIB or Ethernet RDMA Storage driver for ESXi.

It’s not a officiall solution from Mellanox.

Best Regards

Jaehoon Choi

I think so,too.

I can’t understsnd Mellanox’s ESXi driver support during 6 years.

Whenever new driver release then some feature was disappeared everytime.

I’ll ask this situation to VMware community,too!

has anyone managed to get SRP to work on esxi 7?