SR-IOV CentOS 7 Host and CentOS 6.4 guest performance and right set up

HI,

I’ve been investigating SR-IOV support on ConnectX-3, specifically MT27500 with CentOS-7 as the host and CentOS-6.4 as the guest. I’ve create one VF on my card and expose it to the VM. I wasn’t too clear on what to do within the guest, so I installed the entire Mellanox stack, version MLNX_OFED_LINUX-2.2-1.0.2 within the guest as well. I’m able to run ibv_devinfo inside the guest and it seems fine. I’ve also been able to run some benchmarks with Fluid Dynamics codes using verbs within the VMs. However, I am not sure if I’ve set up everything the correct way and if configuration is optimal. My queries:

  1. I haven’t set any specific GUIDS in the sysfs on the host as per section “4.15.6.2.1SR-IOV sysfs Administration Interfaces on the Hypervisor” of the Mellanox documentation.

  2. I have had issues with RDMA connectivity and applications reporting no RDMA connection could be made. Restarting the vms and the hosts resolved the issue somehow,

  3. I see around 20% drop in performance on a Fluid Dynamics code with the current setup. Is this expected ? Is there any fine tuning which can be done on the host and/or the guest ? My current mlx4_core.conf file looks like options

mlx4_core num_vfs=1 port_type_array=1,4 probe_vf=0

  1. Should I really install the whole Mellanox OFED driver stack within the VM, or, is there some other driver I should be installing ?

Thanks!

/D

Hi Aleksey,

Thanks for your response.

I believe I am using IB inside the VMs. I think newer versions of the SM automatically assign GUIDs. I’ve verified that all the tests I run within the VM, use the verbs device, furthermore, MPI reports to be using IBV for transport.

For some reason, I get very bad numbers for osu alltoall latency compared to a few papers which are out there on the net regarding IB on VMs using SR-IOV. I get around 9 us and sometimes this can jump up to 40 us. On the same machines when performing bare-metal latency tests, I get around 2 us.

I wasn’t able to use the CentOS 6.5 inbox OS drivers. The driver did not recognize the VFs. I had to install the Mellanox OFED pack by passing the --guest option to the installer.

I’ve been trying to figure out how the other videos and research papers on the net were able to achieve anywhere between 3-5 us within VMs. Definitely they were using different versions of the OS, cpus etc, but my tests vary so much, pointing something fundamentally wrong in what I’m doing.

/D

Hi devs,

  1. Maybe you don’t use IB on the guest then I doubt that you need GUIDs

  2. What kind of the RDMA problems you saw? Seeing the error message can help to understand the problem better.

  3. Usually, HPC benchmark don’t use virtualization because of performance issue. Virtual machine will never run faster then the real host

  4. It is possible to use inbox driver that comes with OS.

Did you go thought Mellanox tuning guide? If system not tuned it can explain these variations in performance.