DPDK-OVS using connectx3-Pro binding issues

Hi,

We have been trying to install DPDK-OVS on DL360 G7 (HP server) host using Fedora 21 and mellanox connectx-3 Pro NIC.

We used the several tutorials Gilad \ Olga have posted here and the installation seemed to be working up (including testpmd running - see output bellow).

We ran dpdk_nic_bind and didn’t see any user space driver we can bind to the mellanox device:

0000:06:00.0 ‘MT27520 Family [ConnectX-3 Pro]’ if=ens1d1,ens1 drv=mlx4_core unused=ib_ipoib Active

  1. We need to somehow bind this device to a DPDK-compatible driver, can you think of a way to do so ?

  2. Can you please let take a look at the versions we use (fedora, OFED, dpdk, ovs, qemu) and let us know (from your experience) if we should upgrade\downgrade one of them ?

  3. Do you have a more up to date tutorial for our specific HW?

  4. Let us know if you need additional details.

Thanks a lot!!!

===============

SYSTEM DETAILS:

===============

[root@localhost ~]# uname -a

Linux localhost.localdomain 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Using Mellanox Technologies MT27520 Family [ConnectX-3 Pro] NIC

Mellanox OFED version

MLNX_OFED_LINUX-3.1-1.0.3 (OFED-3.1-1.0.3):

OVS version: openvswitch-2.4.0

DPDK version: dpdk-2.1.0

QEMU version: qemu-2.2.1

##ethtool output

root@localhost ~]# ethtool -i ens1

driver: mlx4_en

version: 3.1-1.0.3 (29 Sep 2015)

firmware-version: 2.35.5100

bus-info: 0000:06:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

root@localhost ~]# ethtool -i ens1

driver: mlx4_en

version: 3.1-1.0.3 (29 Sep 2015)

firmware-version: 2.35.5100

bus-info: 0000:06:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

####################Test_PMD Output:

[root@localhost dpdk-2.1.0]# ./x86_64-ivshmem-linuxapp-gcc/build/app/test-pmd/testpmd -c 0xff00 -n 4 -w 0000:06:00.0 – --rxq=2 --txq=2 -i

EAL: Detected lcore 0 as core 0 on socket 0

EAL: Detected lcore 1 as core 0 on socket 1

EAL: Detected lcore 2 as core 8 on socket 0

EAL: Detected lcore 3 as core 8 on socket 1

EAL: Detected lcore 4 as core 2 on socket 0

EAL: Detected lcore 5 as core 2 on socket 1

EAL: Detected lcore 6 as core 10 on socket 0

EAL: Detected lcore 7 as core 10 on socket 1

EAL: Detected lcore 8 as core 1 on socket 0

EAL: Detected lcore 9 as core 1 on socket 1

EAL: Detected lcore 10 as core 9 on socket 0

EAL: Detected lcore 11 as core 9 on socket 1

EAL: Detected lcore 12 as core 0 on socket 0

EAL: Detected lcore 13 as core 0 on socket 1

EAL: Detected lcore 14 as core 8 on socket 0

EAL: Detected lcore 15 as core 8 on socket 1

EAL: Detected lcore 16 as core 2 on socket 0

EAL: Detected lcore 17 as core 2 on socket 1

EAL: Detected lcore 18 as core 10 on socket 0

EAL: Detected lcore 19 as core 10 on socket 1

EAL: Detected lcore 20 as core 1 on socket 0

EAL: Detected lcore 21 as core 1 on socket 1

EAL: Detected lcore 22 as core 9 on socket 0

EAL: Detected lcore 23 as core 9 on socket 1

EAL: Support maximum 128 logical core(s) by configuration.

EAL: Detected 24 lcore(s)

EAL: VFIO modules not all loaded, skip VFIO support…

EAL: Searching for IVSHMEM devices…

EAL: No IVSHMEM configuration found!

EAL: Setting up physically contiguous memory…

EAL: Ask a virtual area of 0x200000000 bytes

EAL: Virtual area found at 0x7fa740000000 (size = 0x200000000)

EAL: Ask a virtual area of 0x200000000 bytes

EAL: Virtual area found at 0x7fa500000000 (size = 0x200000000)

EAL: Requesting 8 pages of size 1024MB from socket 0

EAL: Requesting 8 pages of size 1024MB from socket 1

EAL: TSC frequency is ~2666753 KHz

EAL: Master lcore 8 is ready (tid=c9c2e8c0;cpuset=[8])

EAL: lcore 14 is ready (tid=c58cd700;cpuset=[14])

EAL: lcore 12 is ready (tid=c68cf700;cpuset=[12])

EAL: lcore 13 is ready (tid=c60ce700;cpuset=[13])

EAL: lcore 10 is ready (tid=c78d1700;cpuset=[10])

EAL: lcore 15 is ready (tid=c50cc700;cpuset=[15])

EAL: lcore 11 is ready (tid=c70d0700;cpuset=[11])

EAL: lcore 9 is ready (tid=c80d2700;cpuset=[9])

EAL: PCI device 0000:06:00.0 on NUMA socket 0

EAL: probe driver: 15b3:1007 librte_pmd_mlx4

PMD: librte_pmd_mlx4: PCI information matches, using device “mlx4_0” (VF: false)

PMD: librte_pmd_mlx4: 2 port(s) detected

PMD: librte_pmd_mlx4: port 1 MAC address is e4:1d:2d:bb:6d:c0

PMD: librte_pmd_mlx4: port 2 MAC address is e4:1d:2d:bb:6d:c1

Interactive-mode selected

Configuring Port 0 (socket 0)

PMD: librte_pmd_mlx4: 0x20ad4740: TX queues number update: 0 → 2

PMD: librte_pmd_mlx4: 0x20ad4740: RX queues number update: 0 → 2

Port 0: E4:1D:2D:BB:6D:C0

Configuring Port 1 (socket 0)

PMD: librte_pmd_mlx4: 0x20ad5788: TX queues number update: 0 → 2

PMD: librte_pmd_mlx4: 0x20ad5788: RX queues number update: 0 → 2

Port 1: E4:1D:2D:BB:6D:C1

Checking link statuses…

Port 0 Link Up - speed 40000 Mbps - full-duplex

Port 1 Link Up - speed 40000 Mbps - full-duplex

Done

testpmd>

[root@localhost ~]# python /home/cloud/dpdk-2.1.0/tools/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

============================================

Network devices using kernel driver

===================================

0000:03:00.0 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp3s0f0 drv=bnx2 unused=ib_ipoib Active

0000:03:00.1 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp3s0f1 drv=bnx2 unused=ib_ipoib

0000:04:00.0 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp4s0f0 drv=bnx2 unused=ib_ipoib

0000:04:00.1 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp4s0f1 drv=bnx2 unused=ib_ipoib

0000:06:00.0 ‘MT27520 Family [ConnectX-3 Pro]’ if=ens1d1,ens1 drv=mlx4_core unused=ib_ipoib Active

Other network devices

=====================

========

TESTING:

========

We tried here to bind each of the available drivers to the device. However, none of them has caused the device to be using a DPDK-compatible driver.

python /home/cloud/dpdk-2.1.0/tools/dpdk_nic_bind.py --bind=ib_ipoib 0000:06:00.0

Routing table indicates that interface 0000:06:00.0 is active. Not modifying

[root@localhost cloud]# ifconfig ens1 down

[root@localhost cloud]# ifconfig ens1d1 down

[root@localhost cloud]# python /home/cloud/dpdk-2.1.0/tools/dpdk_nic_bind.py --bind=ib_ipoib 0000:06:00.0

Error: bind failed for 0000:06:00.0 - Cannot open /sys/bus/pci/drivers/ib_ipoib/new_id

//////////////////////////////////////////////////////////////////////////////////

[root@localhost cloud]# python /home/cloud/dpdk-2.1.0/tools/dpdk_nic_bind.py --bind=mlx4_core 0000:06:00.0

[root@localhost cloud]# python /home/cloud/dpdk-2.1.0/tools/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

============================================

Network devices using kernel driver

===================================

0000:03:00.0 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp3s0f0 drv=bnx2 unused=ib_ipoib Active

0000:03:00.1 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp3s0f1 drv=bnx2 unused=ib_ipoib

0000:04:00.0 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp4s0f0 drv=bnx2 unused=ib_ipoib

0000:04:00.1 ‘NetXtreme II BCM5709 Gigabit Ethernet’ if=enp4s0f1 drv=bnx2 unused=ib_ipoib

0000:06:00.0 ‘MT27520 Family [ConnectX-3 Pro]’ if=ens1d1,ens1 drv=mlx4_core unused=ib_ipoib Active

Other network devices

=================

I think you need to use the IGB UIO module. Going through the steps in tools/setup.sh helped me. mlx4_core (ethernet) and ib_ipoib (IPoIB) are kernel mode drivers.

Network devices using DPDK-compatible driver ============================================ 0000:43:00.0 'MT27500 Family [ConnectX-3]' drv=igb_uio unused= 0000:44:00.0 'Ethernet 10G 2P X520 Adapter' drv=igb_uio unused= 0000:44:00.1 'Ethernet 10G 2P X520 Adapter' drv=igb_uio unused=

Older version of DPDK indeed used intel_dpdk as the library name. newer versions use dpdk as the library name (also seems the configuration option for this is gone, but i guess i am missing something). you have 2 options - use latest OVS (like in my reply), which is prefered, or change the OVS configure file to look for dpdk and not intel_dpdk.

you can find the library here -

x86_64-native-linuxapp-gcc/lib/libdpdk.a

Really hope i’m right here… based on my memory

  1. Any comments regarding dpdk version?

[A] I recommend to use latest - DPDK 2.2. i’ve used MLNX_DPDK 2.1 and DPDK 2.2 from upstream.

  1. As for ivshmem - we currently using native. However, you didn’t include it in your tutorial. Should I configure it manually or is it already enabled by default (as part of the DPDK package on mlx site?

[A] To be honest, i never used ivshmem. Mellanox do not change the default and therefore i don’t think it is enabled. I do need to try it and then will have better answer (and tutorial)

  1. Note that in on one of your tutorials you have mentioned that CONFIG_RTE_BUILD_COMBINE_LIBS should be configured to N while in your tutorial above you claim that it should be configured to Y.

[A] It is application specific. OVS needs DPDK to be built as combined lib, other application no. by default it combined lib is disabled. for the examplw application like testpmd you do not need it.

  1. A more general question: We try to transmit data from two guests located in two different hosts (connected using mlx 40GB switch) and DPDK-OVS. We wanted to make sure that we do not miss anything: while dpdk is installed in both the host and guests OS, ovs installation is needed only in the hosts.

[A] Yes, OVS installed only on the host. DPDK on the guests is not a must (but DPDK does support virtio). the short tutorial above do not cover DPDK in the guests.

  1. Have you tried configuring DPDK using virt-manager as well?

[A] No, only QEMU. i guess you can do it with manually edit the xmls but never did it myself. might be a good suggestion for the post, i will try to add it.

Looks like giving DESDTDIR explicitly was the answer to my question.

make install -j T=x86_64-native-linuxapp-gcc DESTDIR=/root/dpdk/dpdk-2.2.0

Hello,

i do not use the dpdk_nic_bind script because we do not use UIO (IGB UIO is Intel’s uio driver for IGB). For MLNX NICs, DPDK is yet another user space application written over the Raw Eth VERBs interface. control path is going through the mlx4/5 kernel modules and data path directly to HW from user space (somewhat similar to the RDMA mechanism). It has some nice advantages like security and that the NIC can be managed by the kernel like one would do in none DPDK environment (i.e all standard tools like ethtool etc.).

Lee, if you managed to make this work with the script using MLNX NIC, would be great if you can share the details.

I’m currently writing a post for OVS-DPDK and MLNX NICs, so until then, here is a very dirty guide that might help (Sorry for the format, will be much nicer in the post ). this was done on ConnectX-4 but should work the same for ConnectX-3.

Some typos, bad phrasing and minor errors can be expected , i will correct them in the post.

Find the NIC numa node:

# mst start

# mst status -v

MST modules:

------------

MST PCI module loaded

MST PCI configuration module loaded

PCI devices:

------------

DEVICE_TYPE MST PCI RDMA NET NUMA

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0.1 11:00.1 mlx5_1 net-enp17s0f1 0

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0 11:00.0 mlx5_0 net-enp17s0f0 0

# mst stop

Configure Hugepages

OVS needs a system with 1GB hugepages support, which can only be allocated during boot. Note that for NUMA machine the pages will be divided between the NUMA nodes.

For best performance you might want to have two separate hugepages mount points, one for QEMU (1G pages) and one for DPDK (2M pages). see here - 29. Vhost Sample Application — Data Plane Development Kit 2.2.0 documentation 34. Vhost Sample Application — Data Plane Development Kit 22.07.0 documentation

2M pages can be allocated after the machine booted. Here i used only 1G pages (and no performance tuning done)

Adding boot parameters to enable 8 x 1GB HugePages (using grubby here, can be done in many ways)

Need to add “default_hugepagesz=1GB hugepagesz=1GB hugepages=” to the kernel boot parameters.

# yum install grub2-tools

# grubby -c /boot/grub2/grub.cfg --default-kernel

/boot/vmlinuz-3.10.0-229.el7.x86_64

# grubby -c /boot/grub2/grub.cfg --args=“default_hugepagesz=1GB hugepagesz=1GB hugepages=8” --update-kernel /boot/vmlinuz-3.10.0-229.el7.x86_64

Verify

# grubby -c /boot/grub2/grub.cfg --info /boot/vmlinuz-3.10.0-229.el7.x86_64

index=0

kernel=/boot/vmlinuz-3.10.0-229.el7.x86_64

args=“ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 default_hugepagesz=1GB hugepagesz=1GB hugepages=8”

root=UUID=c4d1bf80-880c-459e-a996-57cb41de2544

initrd=/boot/initramfs-3.10.0-229.el7.x86_64.img

title=Red Hat Enterprise Linux Server 7.1 (Maipo), with Linux 3.10.0-229.el7.x86_64

Reboot the machine

Configure 4 pages on the right NUMA node (note that this should be done by default, i just like to make sure)

echo 4 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

By default the hugepages should be mounted on /dev/hugepages

Reboot the machine

Build DPDK

Download DPDK

Edit vim config/common_linuxapp:

CONFIG_RTE_BUILD_COMBINE_LIBS=y

CONFIG_RTE_LIBRTE_MLX5_PMD=y

Make sure CONFIG_RTE_LIBRTE_VHOST_USER=y

# make install T=x86_64-native-linuxapp-gcc

Install OVS

_# wget https://github.com/openvswitch/ovs/tarball/master *https://github.com/openvswitch/ovs/tarball/master*_

# tar -zxvf master

# cd openvswitch-ovs-39cc5c4/

# ./boot.sh

# export LIBS=“-libverbs”

# ./configure --with-dpdk=/var/soft/dpdk/dpdk-2.2.0/x86_64-native-linuxapp-gcc --disable-ssl

# make CFLAGS=‘-O3 -march=native’

# make install

Start OVS

# mkdir -p /usr/local/etc/openvswitch

# mkdir -p /usr/local/var/run/openvswitch

# rm /usr/local/etc/openvswitch/conf.db ## If not first time run

# ovsdb-tool create /usr/local/etc/openvswitch/conf.db /usr/local/share/openvswitch/vswitch.ovsschema

Start ovsdb-server

# export DB_SOCK=/usr/local/var/run/openvswitch/db.sock

# ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach

Start OVS

# ovs-vsctl --no-wait init

# ovs-vswitchd --dpdk -c 0xf -n 4 --socket-mem 1024 – unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log

Create OVS bridge, add DPDK port and vhost-user port

# ovs-vsctl add-br ovsbr0 – set bridge ovsbr0 datapath_type=netdev

# ovs-vsctl add-port ovsbr0 dpdk0 – set Interface dpdk0 type=dpdk

# ovs-vsctl add-port ovsbr0 vhost-user1 – set Interface vhost-user1 type=dpdkvhostuser

vhost-user device created here -

/usr/local/var/run/openvswitch/vhost-user1

Run VM with vhost-user back-end device-

qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -chardev socket,id=char0,path=/usr/local/var/run/openvswitch/vhost-user1 -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1,mac=12:34:00:00:50:2c -object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc /data1/vms/rhel6.7-master.qcow2

Hope this helps and i did not missed anything

Anyway a full community post will be available soon

Gilad,

Thanks for your quick response!

We would like to try your suggested tutorial, however, in order to do so, can you please address our 2nd question? We are not sure which versions should we use?

Moreover, we are currently working with Fedora OS based on recommendations given by engineers who tested DPDK using Intel’s NICs.

Any recommendations on that as well ?

Looking forward to your reply

Any version OFED is working on should be fine from our perspective. I can’t really comment on OVS though. i have tested it on RHEL6.5 (if i remember correctly) and 7.1.

See here - InfiniBand OS Distributors InfiniBand OS Distributors

Hope this helps and let me know if anything else is needed.

  1. Any comments regarding dpdk version?

  2. As for ivshmem - we currently using native. However, you didn’t include it in your tutorial. Should I configure it manually or is it already enabled by default (as part of the DPDK package on mlx site?

  3. Note that in on one of your tutorials you have mentioned that CONFIG_RTE_BUILD_COMBINE_LIBS should be configured to N while in your tuturial above you claim that it should be configured to Y.

  4. A more general question: We try to transmit data from two guests located in two different hosts (connected using mlx 40GB switch) and DPDK-OVS. We wanted to make sure that we do not miss anything: while dpdk is installed in both the host and guests OS, ovs installation is needed only in the hosts.

  5. Have you tried configuring DPDK using virt-manager as well?

Thanks.

[using dpdk 2.1_1.1 and ovs 2.4.0]

I have noticed that during ovs configuration stage i get link error with dpdk.

exploring the conig.log i have noticed that it tries to load (hard coded) -lintel_dpdk

which library is equivalent to intel’s lib. is it librte_eal?

Hi, I’m using dpdk 2.2 and OFED 3.1.1.x on linux 3.10.0-327.4.4.el7.x86_64 (CentOS 7.2)

with ConnectX 3 pro. Was going through your guide. On stage make install T=x86_64-native-linuxapp-gcc.

Output (very end of it) was:

INSTALL-APP testpmd

INSTALL-MAP testpmd.map

LD test

INSTALL-APP test

INSTALL-MAP test.map

Build complete [x86_64-native-linuxapp-gcc]

Installation cannot run with T defined and DESTDIR undefined

Can you please suggest some workaround to make installation possible?

Thank you a lot.