Unstable ConnectX-3 Ethernet Performance on ESXi 6.5 update 1

Step 01. Build a initial network connection with TCP/IP (lossy) network

Hi!

I’m start build a basic etherent network that mentioned previously.

But performance level isn’t reached 56Gb level.

I’m test with iPerf 8 parallel option on seperated ESXi 6.5 update 1 hosts.

Results are belows.

A. ESXi 6.5 inbox Ethernet driver 3.16.0

* 56Gb Ethernet iPerf client - physical ESXi 6.5 update 1 host 01 with MTU 4092

* 56Gb Ethernet iPerf server - physical ESXi 6.5 update 1 host 01 with MTU 4092

B.ESXi Driver 1.9.10.6 Ethernet driver

* 56Gb Ethernet iPerf client - physical ESXi 6.5 update 1 host 01 with MTU 4092

* 56Gb Ethernet iPerf server - physical ESXi 6.5 update 1 host 02 with MTU 4092

B.ESXi Driver 1.8.2.5 IPoIB driver

* 56Gb IPoIB iPerf client - physical ESXi 6.5 update 1 host 01 with MTU 4092

* 56Gb IPoIB iPerf server - physical ESXi 6.5 update 1 host 01 with MTU 4092

You said Mellanox ConnectX-3 support 56Gb Ethernet link-up and performance, but it isn’t reaced at 40, 50Gb performance level.

Same test was completed on Intel X520DA2 with SX6036G 10Gb port, they shows me a stable 10GbE performance.

But your ConnectX-3 shows me a unstable performance pattern in 10, 40, 56GbE port modes.

How can I resolve this issue?

BR,

Jae-Hoon Choi

Hi!

Performance was more lower then before num_rings_per_rss_queue=4 option.

I can’t find netq option in inbox driver & I can’t access your link netqueue.jpg.

Is there another solution?

Ethernet link-up speed is industry standard between every vendor.

It’s a serious problem.

If your HCA achieve maximum speed on RDMA only, just say it that our product focus RDMA communication only.

Your every product brief said support 40, 50, 56, 100, 200Gb Ethernet.

Why should I open then support ticket for your problem?

Why always your staff click a correct answer?

I think who request the question must click a correct answer button.

BR,

Jae-Hoon Choi

Would suggest that you start first with:

  1. “Performance Tuning” for Mellanox Adapters & driver, to ensure you have proper BIOS configuration, to check the cpu core frequency and that you use PCIe generation that suit the adapter etc…

  2. Check the Mellanox inbox driver by using “esxcli software vib list |grep nmlx4” command.

for example:

esxcli software vib list |grep nmlx4

Name Version Vendor Acceptance Level Install Date


nmlx4-core 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-12-16

nmlx4-en 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-12-16

nmlx4-rdma 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-12-16

  1. Assuming that the56Gb nic is up check the netqueue is enabled by default.

Sign in to your account

  1. Set nmlx4_en module parameters num_rings_per_rss_queue to 4.

esxcli system module parameters set -m nmlx4_en -p “num_rings_per_rss_queue=4”

reboot

esxcli system module parameters list -m nmlx4_en

In case the suggested above is implemented and you still have low performance, then apply to Mellanox support (support@mellanox.com mailto:support@mellanox.com ) to get further assistance on this

Hi!

I find this link about vmxnet3 vNIC performance limitation.

Network Improvements in vSphere 6 Boost Performance for 40G NICs - VMware VROOM! Blog - VMware Blogs https://blogs.vmware.com/performance/2015/04/network-improvements-vsphere-6-boost-performance-40g-nics.html

I think this limitation leads to performance problems.

I’m also change to Global Pause mode on SX6036G that shows me a solid slightly under 10Gbps performance level.

Is there another solution?

BR,

Jae-Hoon Choi

P.S

This screenshot is my POC infrastructures.

When I query about rdma device list to my system, it shows me a 10 Gbps speed on ConnectX3.

Is it correct?