Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

Hi all,

I’m a bit puzzled by the current situation with Mellanox drivers for VMware ESXi. It looks like SRP support is all dusted and won’t be developed anymore. Pity, but fine, as long as iSER is there, right?

However, it looks like 2.x packages lack iSER support as well (regardless of where you look, either under VPI or ETH drivers section). IPoIB is there though.

The 1.9.x packages do include iSER support, but, on the other hand, they lack IPoIB. Which means you can use iSER only in ETH mode.

So, what’s the direction? Which drivers is one supposed to use to get iSER (read: fast RDMA storage) work over IB? Are we all supposed to upgrade to the newest Connect-X4 and switches just because they have the same speed both for IB and ETH? What about those people who have FDR IB but only 40GBE or have only IB capable switches?

What I do personally is using 1.8.3 beta for iSER and 1.8.2 for SRP, even with the latest versions of ESXi 6.0. Luckily, they still work, although unsupported, but, the moment they break, what am I supposed to do?

Thanks!

So just summarizing the options:

Driver VersionStorage ProtocolAdapter ModeAdapter FamilyVMware Ver Supported1.8.2.5IPoIB+SRPVPI OnlyCX-2ESXi5.5,ESXi6.01.8.3betaIPoIB+iSerVPI OnlyCX-2,CX-3 ProESXi5.5,ESXi6.01.9iSerEN OnlyCX-3 ProESXi5.5,ESXi6.02IPoIB iSCSIVPI OnlyCX-2,CX-3 ProESXi5.5,ESXi6.03native iSCSIEN OnlyCX-3 ProESXi5.5,ESXi6.0,ESXi6.54native iSCSIEN OnlyCX-4ESXi5.5,ESXi6.0,ESXi6.5

(see later post with corrected chart)

IPoIB is TCP/IP in software only (no RDMA) on VPI only

No support for any CX-2 solutions

So since I don’t have a full complement of CX-3 Pro everywhere and only an un-managed IB switch I would be best to stick with 1.8.2.5 on SRP under ESXi6.0

Also since I need windows storage support I would be best to stick with SRP on 2008R2 since there is no iSer support on windows server. (and no SRP support after 2008R2)

I will stick with ESXi 6.0 currently (probably wouldn’t be moving to 6.5 yet anyway) but when I do it looks like I will need to replace the 40G IB switch with an EN switch and fill out my storage network with CX-3 Pro to get iSer support. Also I would need to hopefully find an iSer driver for Server 2012R2 (and/or 2016)

Is there any way with ESXi to use the CX-2 as a 10G ethernet adapter {with appropriate QSFP to SFP adapter} or is there no support at all on ESXi 6.0 (or 6.5).

Okay!

I want to hear it.

ConnectX-3 family will be drop to support RoCE iSER on ESXi 6.5 and above…:(

My friend also test with MCX354A-FCBT on ESXi 6.5 via esxcli rdma iser add command.

But failed to create iSER initiator.

This is definitely encouraging, but, clearly, doesn’t look like a finished piece of work… Any chance to get an answer from mellanox-admin arlonm Infrastructure & Networking - NVIDIA Developer Forums ?

I just wanted to bump my last question to erez again. You made mention of roce support but it seems there are no rdma storage drivers supported. Its fine if that is the case, I am just looking for clarity on the current situation. If we do not have iser support was there some other rdma over ethernet storage driver you were referring to when you said roce is supported?

Thanks for the reply.

So just clarifying a few things:

  • So is “IB Para-virtualization (PV)” is actually “Paravirtualization of IP over Infiniband” otherwise known as (IPoIB). This is the tool which allows us to run IP based applications on an IB fabric.
  • The SR-IOV features allow a physical NIC to be shared in a virtual environment.

So if you look at my previous table/chart there was support up to ESXi6.0 for most protocols and features because these used the VMKLinux device driver model.

For those who don’t know VMKLinux was a carry over from the old ESX (which required Linux) days to allow Linux device drivers to be essentially still used with ESXi (with a few mods) - even through in ESXi Linux doesn’t technically exist. In ESXi5.5 VMware released a new “Native Device Driver Model” where the drivers are directly interfacing with the VMKernel now rather than going through the VMKLinux shim compatibility layer. (VMware released the new device driver model so drivers can be more efficient, flexible and have better performance. In addition there is more debugging and troubleshooting features along with support for new capabilities such as hot plugging - see more info here https://blogs.vmware.com/tap/2014/02/vmware-native-driver-architecture-enables-partners-deliver-simplicity-robustness-performance.html https://blogs.vmware.com/tap/2014/02/vmware-native-driver-architecture-enables-partners-deliver-simplicity-robustness-performance.html )

So you mentioned that SRP protocol support has been retired and won’t be carried forward with the new driver model so I am assuming we won’t be able to use it on ESXi v6.5 and later versions of ESXi.

Also stated was that ESXi v6.5 includes drivers for CX-4 inbox (and CX-5 drivers are coming later this year) and that these drivers support ROCE and iSer when adapters are run in ethernet mode on an ethernet switch only.

Question still remaining:

  • Because of the new native drivers included inbox for ESXi v6.5 the older VMKLinux drivers will not work any longer. Can we disable or remove these drivers and continue to use the older VMKLinux drivers similar to what is described here VMware Knowledge Base VMware Knowledge Base ?
  • Will there be any support for CX-3 and CX-3Pro adapters under the new Native device driver model for ESXi v6.5 and later? (and therefore support for iSer with CX-3 and CX-3Pro like you provide for CX-4 now (and for CX-5 in the future)?
  • Will there be any support for IPoIB on the new Native device driver model or is this not being carried forward either?

Because the way it looks at the moment is if we want to run RDMA accelerated storage on ESXi v6.5 we will need to purchase CX-4 or later adapters (currently only CX-4) and run them on an ethernet switch, otherwise we will be stuck at ESXi v6 due to the lack of CX3 drivers under the new device driver model and inability to use the old ones.

Also are the VPI cards supported? I would like to use them as eth mode obviously.

At the moment I use SRP target provided by SCST over the latest Mellanox OFED distro (3.4.1) on CentOS 7.2. The storage I expose is ZFS on Linux (some SSD, some magnetic).

I found the session interesting and informative but it still doesn’t answer the question “What About IB”?, but it does look like iSer is still being developed and will return.

My take-away was that you still require ROCE compatible adapters to get this to work. The the minimum requirements discussed were CX-3 Pro adapters and I guess a 40G Ethernet switch.

Realistically this probably means a Mellanox managed IB switch (because there aren’t a huge number of 40G switch choices) which when all added up puts it way out of my budget.

(although apparently CX-2 adapters can do ROCE at 10G)

If iSer was possible with this it might make it a little closer to my budget as I have these cards already and a 10G switch is possible.

Does the included ESXi 6.5 have RDMA support for the hypervisor back to storage? or is it just for a guest. I really am only using SRP at a HOST to Storage level, nothing inside of the vm is infiniband aware.

Mellanox, if you arent going to release SRP drivers again, would you consider releasing the source for the old ones and we will take it from here…

Some corrections to your list:

  • 1.8.2.4 is for 5.x, 1.8.2.5 is for 6.0

  • 1.8.3 beta supports both SRP and iSER and can be forcibly installed on 5.x and 6.0

  • All 1.8.x.x support X2, X3 and X3 Pro. Also, they’re the last to support X2. Not sure about the EN mode, haven’t tried it.

  • 1.9.x, 2.x and 3.x support only X3 and X3 Pro, none support X2 or older.

  • 1.9.x and 3.x support only EN. 1.9.x is the only one supporting iSER.

  • 2.x is the latest supporting the IB mode (only for X3 and X3 Pro). It may also support the EN mode, but I haven’t tested it.

  • Connect-IB, X4 in the IB mode and X5 aren’t supported at all - I think this one is particularly insulting, because it means even relatively new cards are left without ESXi support.

I think you’re absolutely right with your conclusion to stick with 1.8.2.5 on SRP under ESXi 6.0, this is exactly what I intend to do, even though I have X3 (not Pro) across the board and a managed IB/EN switch. Theoretically, I could use the 1.9.x in the EN mode (still on ESXi 6.0) over iSER, but performance wouldn’t be on the same level as SRP and it wouldn’t allow me to move to ESXi 6.5 anyway. I don’t need any Windows support, my only storage client are ESXi hosts.

As for using X2 as 10Gb NICs, I think this is how they’re recognised by the inbox ESXi 6.0 drivers (although not 100% sure). You can give it a shot.

Hi Erez,

I understand you’re new to Mellanox (joined in Nov 2016), which is after I wrote my original post. I’d like to invite you to re-read it again and then try to honestly answer a very simple question: has Mellanox actually been dedicated to providing adequate solutions for the VMware platform, as you stated in your latest post? You don’t have to share the answer with the rest of us here, just, please, be honest with yourself.

The way I see things, with 6.5 the situation is changing from bad to worse (מבכי אל דכי, as they say in broken biblical Hebrew). At least, with ESXi 6.0 and below we could use IB switches, with ESXi 6.5 we can’t. This means tons of hardware (some quite recent and with decent specs) become throwaway money.

If you do have any interesting developments in the pipeline, we’d like to hear about those.

Thanks!

I Hope this slide is real and they are actively working on it, while I would rather see SRP support, I would def settle for an ISER storage target solution at this point. My guess is they are getting too caught up with the Guest aspects of this. I would assume that is the more complicated solution vs host level storage connectivity. At the moment I just use SRP for storage connectivity, the guests are unaware. Curious if anyone here is doing guest level infiniband or if it is just for the host.

Looks like the difference between CX-2,CX-3 and CX-3Pro,CX-4 is RoCEv1 vs. RoCEv2…

I found the following page and its related pages informative as to why RoCEv1 vs. RoCEv2…RoCE v2 Considerations https://community.mellanox.com/s/article/roce-v2-considerations

Although RDMA over Converged Ethernet - Wikipedia RDMA over Converged Ethernet - Wikipedia explains it the simplest:

RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed

Further to this, even more interesting is “RoCE versus InfiniBand” topic on the same page…

So the takeaway from both of these articles when put together seems to be that while InfiniBand using RDMA(and as far as ESXi storage is concerned SRP) may be better than RoCEvX(and as far as ESXi storage is concerned iSER) and the fact that RoCE runs on Ethernet means those who are trying to leverage RDMA based storage protocol have a cheaper entry point because they already have Ethernet technology.(problem is this is not always true as most iSER is done on 10G and 40G Ethernet which the user may or may not have an existing investment in)

Also the fact that RoCEv1 suffers from latency issues associated with Ethernet (at L1/L2) in larger networks means that RoCEv2 (at L3) is the preferred platform as it mitigates the latency with a congestion protocol and gets closer to InfiniBand style performance.(hence the reason SRP still seems to out performs RoCEv2 at the same speed but all the RoCE development is on RoCEv2 going forward)

So I think until I can afford CX3 Pro adapters and associated switching the best option is to stay with SRP (which will work with ESXi v5.5/6.0 and my existing Infiniband investment) but the associated move to ESXi 6.5 may need to necessitate far more planning and going CX-3Pro and RoCEv2 + iSer.

I guess the question still boils down to why no SRP on ESXi 6.5 and I think looking at the whole picture its simply a case of economics and demand (and when you have to foot development costs that’s an important consideration - what can you make revenue from) Also more wider development (read here development for current platforms like ESXi6.5) is happening on iSer because its based on Ethernet and not Infiniband at a physical level and maybe more tech companies see a wider benefit for them because they have Ethernet but not Infiniband technology. Also these developers may have more iSCSI experience than IB experience which means they can maybe leverage existing knowledge to move forward. Don’t get me wrong I still think Infiniband is great but I think despite SRP getting to where it has, long term Infiniband will remain a HPC tool and RDMA based storage with go iSer/RDMA Direct depending on whose camp you settle in.

Hi Jaehoon,

Very useful post! Can you, please, let us know if you got it working with ESXi 6.5 and, if yes, was it exactly the same procedure?

Thanks!

I understood.

Thank you Erez for your exact answer.

Conclusion:

1st) vNIC must use a conventional Ethernet or RoCE with supported Ethernet switch.

2nd) SRP, iSER, IPoIB based IPoIB protocol support only SRIOV in GuestOS

Therefore VMware admin must use minumum 2 sererated HCA per ESXi host for Ethernet & SRIOV based IPoIB Protocol.

Is it right?

Jaehoon Choi

7 days later and no reply from Mellanox… Seriously?

Looking at the current driver selection across different OS/cards/medium (IB vs. ETH) it looks like the only space consistently supported by Mellanox is Linux. Indeed, for Linux you have everything:

  • every card is supported (all the way from Connect-X2 to Connect-X5)
  • IB and ETH and the possibility to switch from one to another for the cards that support it (VPI), and you can even use IB on one port and ETH on another
  • iSER initiator is supported across the board and iSER target is supported with both LIO and SCST
  • SRP initiator is supported across the board and SRP target is supported with SCST

So, if you use KVM as your hyperwisor , there is no problem.

However, if you want to use Mellanox IB technology in conjunction with currently the most popular hyperwisor (VMware ESXi), you’re in trouble:

  • there is no official support for ESXi 5.5 and up for any card older than Connect-X3
  • the only VPI cards supported in IB mode are Connect-X3/Pro
  • Connect-IB cards are not supported at all
  • Connect-X4 cards are supported only in ETH mode
  • dual-port VPI cards support only the same protocol (IB or ETH) on both ports, not a mix
  • SRP initiator is no longer available
  • iSER initiator is available only with 1.9.x.x drivers only over ETH and only for Connect-X3/Pro cards
  • the current IB driver 2.3.x.x is compatible only with ESXi 5.5 (not 6.0!), works only with Connect-X3/Pro cards and includes neither SRP nor iSER initiator

My question is very simple: what’s the long term strategy of Mellanox with regards to hyperwisor support? Are they suggesting that everyone considering Mellanox products should switch to KVM as their hyperwisor of choice? Or they should abandon RDMA and use Mellanox adapters/switches only as 56/100Gbe network infrastructure?

I would REALLY appreciate some reaction from Mellanox staff, who no doubt have already seen this thread, but, for some reason, chose not to react to it…

What’s your RDMA Storage target and OS?

I prefer Solaris family ZFS COMSTAR but ESXi 6.0 and above won’t work properly.

Best Regards.

Mellanox SRP initiator in vSphere OFED 1 8.2.4 shows me a good performance.

But it has a issue with ZFS SRP target that cause problem to VM auto start function in ESXi host.

SRP is a native RDMA SCSI protocol that light-weight storage protocol and~ very old ancient one.

Mellanox iSER initiator in vSphere OFED 1.8.3 shows me a good performance.

IPoIB iSER initiator can support properly my ZFS iSER target that can support VM auto start function in ESXi host perfectly.

But above 2 of protocols in ESXi 6.x will be a journey to PSOD world.

IPoIB iSCSI was a bad performer in history.

They show me a only 450~650MB/s throughput with 2port ConnectX-2 HCA between ZFS IPoIB iSCSI target.

AND~ very high processor usage level will let you go to tour the Andromeda Galaxy above your roof!

I decide to move IPoIB iSCSI with vSphere OFED 2.4.0 and try SR-IOV function in ESXi 6.x environment.

Because vSphere OFED 2.4.0 can support SR-IOV and both IB and ETH mode or VPI.

vSphere OFED 2.4.0 show me a big improvement in IPoIB iSCSI performance with 2 of MHQH19B-XTR QDR HCAs.

Performance level was over 2GB/s in peak time…

But not like as RDMA protocol…

01.Mellanox vSphere OFED 2.4.0 iometer test in IPoIB iSCSI with 2 of MHQH19B-XTR and OmniOS ZFS iSER Target

Also SR-IOV support was impressive.

Absolutely, ConnectX-2 was EOL product and I build a custom configuration firmware with binary file version 2.9.1314 from IBM site to support SR-IOV.

Here is a result.

  1. SR-IOV VF list in MHQH19B-XTR ConnectX-2 HCA

  1. vSphere 6.0 Update 2 Build 4192238 PCI device lists

  1. Windows Guest RDMA communication test

Mellanox show me a good product concept and also many functions in their brochure.

But it’s almost impossible to get a good manuals and best practice guide like Dell or etc.

So many failures make a above test result.

Unstable driver!

Unstable firmware!

Undescripted driver options!

For what?

RDMA is a good concept and RDMA NIC show a excellent performance!

Mellanox said their product can support almost major OS environment.

But there ware always exist many bugs and limits in almost OS environments.

Latest product ConnectX-4,5 also useless in vSphere and Hyper-V environment, too.

It’s almost EOL of vSphere 5.x and Mellanox show a beta level driver and laziest support for vSphere environment in historically.

vSphere OFED 1.8.3 beta IPoIB iSER initiator have a critical bug that was show in public some years ago.

I think vSphere OFED 1.8.3 beta IPoIB iSER initiator poor quality, tricky SRP initiator modification for experimental!

Everybody can find that how it was a critical that issue can find in resolved issue in vSphere Ethernet iSER dirver 1.9.10.5 release notes.

I’m waiting VMWorld in Aug. 2016, and expect vRDMA nic driver in new VMwareTools and stable IPoIB iSER initiator support.

I’ll make a decision in near future to move or not…