Hi,
I asked this on the DPDK users mailing list too but this may be a better forum for it.
I have a pair of Mellanox MCX354A-FCBT NICs and I’m having trouble scaling up RX performance. It appears that RSS is not working and RX speed is limited by a single queue.
According to the documentation RSS is supported on the mlx4 driver, and debugging the eth dev initialization code I can see the driver setting up RSS apparently with success. I can generate 34Mpps from one NIC using 8 queues, but I can only ever receive at 20Mpps on the other NIC, no matter how many queues I use.
The generated packets have randomized source/destination IP addresses and source/destination UDP ports, so they should hash to different RX queues.
The NICs are connected directly to each other with a DAC cable. They are on different NUMA nodes and I’m placing TX/RX lcores on the appropriate socket for each NIC. It doesn’t matter which NIC I use as the sender, the results are exactly the same. I have tried both pktgen and my own code but didn’t see any difference.
The server is a 2x 12-core Intel E5-2680 v3 2.5GHz. The Mellanox NICs are flashed with the latest firmware and I’m using MLNX_OFED 3.3. I’m using the MLNX_DPDK 2.2 distribution, but I also tried the standard DPDK v16.04 and the result was the same.
Here’s the output of ibstat:
- CA ‘mlx4_0’
- CA type: MT4099
- Number of ports: 2
- Firmware version: 2.36.5000
- Hardware version: 1
- Node GUID: 0x0002c90300310c30
- System image GUID: 0x0002c90300310c33
- Port 1:
- State: Active
- Physical state: LinkUp
- Rate: 56
- Base lid: 0
- LMC: 0
- SM lid: 0
- Capability mask: 0x0c010000
- Port GUID: 0x0202c9fffe310c30
- Link layer: Ethernet
- Port 2:
- State: Active
- Physical state: LinkUp
- Rate: 56
- Base lid: 0
- LMC: 0
- SM lid: 0
- Capability mask: 0x0c010000
- Port GUID: 0x0202c9fffe310c31
- Link layer: Ethernet
- CA ‘mlx4_1’
- CA type: MT4099
- Number of ports: 2
- Firmware version: 2.36.5000
- Hardware version: 1
- Node GUID: 0x0002c90300318200
- System image GUID: 0x0002c90300318203
- Port 1:
- State: Active
- Physical state: LinkUp
- Rate: 56
- Base lid: 0
- LMC: 0
- SM lid: 0
- Capability mask: 0x0c010000
- Port GUID: 0x0202c9fffe318200
- Link layer: Ethernet
- Port 2:
- State: Active
- Physical state: LinkUp
- Rate: 56
- Base lid: 0
- LMC: 0
- SM lid: 0
- Capability mask: 0x0c010000
- Port GUID: 0x0202c9fffe318201
- Link layer: Ethernet
continued…
Below are the pktgen results. Note that the first NIC is 0000:03:00.0 and is assigned ports 0-1, and the second NIC is 0000:a1:00.0 and is assigned ports 2-3. I’m testing TX on port 0 and RX on port 2, which are connected directly. Random packets are generated by using the pktgen script found here.
- $ app/pktgen -c ffffff -n 4 -w 0000:03:00.0 -w 0000:a1:00.0 --socket-mem=1024,1024 – -N -T -P -m “[0-7].0,[12-19].2”
-
- Copyright (c) <2010-2016>, Intel Corporation. All rights reserved. Powered by Intel® DPDK
- EAL: Detected lcore 0 as core 0 on socket 0
- EAL: Detected lcore 1 as core 1 on socket 0
- EAL: Detected lcore 2 as core 2 on socket 0
- EAL: Detected lcore 3 as core 3 on socket 0
- EAL: Detected lcore 4 as core 4 on socket 0
- EAL: Detected lcore 5 as core 5 on socket 0
- EAL: Detected lcore 6 as core 8 on socket 0
- EAL: Detected lcore 7 as core 9 on socket 0
- EAL: Detected lcore 8 as core 10 on socket 0
- EAL: Detected lcore 9 as core 11 on socket 0
- EAL: Detected lcore 10 as core 12 on socket 0
- EAL: Detected lcore 11 as core 13 on socket 0
- EAL: Detected lcore 12 as core 0 on socket 1
- EAL: Detected lcore 13 as core 1 on socket 1
- EAL: Detected lcore 14 as core 2 on socket 1
- EAL: Detected lcore 15 as core 3 on socket 1
- EAL: Detected lcore 16 as core 4 on socket 1
- EAL: Detected lcore 17 as core 5 on socket 1
- EAL: Detected lcore 18 as core 8 on socket 1
- EAL: Detected lcore 19 as core 9 on socket 1
- EAL: Detected lcore 20 as core 10 on socket 1
- EAL: Detected lcore 21 as core 11 on socket 1
- EAL: Detected lcore 22 as core 12 on socket 1
- EAL: Detected lcore 23 as core 13 on socket 1
- EAL: Detected lcore 24 as core 0 on socket 0
- EAL: Detected lcore 25 as core 1 on socket 0
- EAL: Detected lcore 26 as core 2 on socket 0
- EAL: Detected lcore 27 as core 3 on socket 0
- EAL: Detected lcore 28 as core 4 on socket 0
- EAL: Detected lcore 29 as core 5 on socket 0
- EAL: Detected lcore 30 as core 8 on socket 0
- EAL: Detected lcore 31 as core 9 on socket 0
- EAL: Detected lcore 32 as core 10 on socket 0
- EAL: Detected lcore 33 as core 11 on socket 0
- EAL: Detected lcore 34 as core 12 on socket 0
- EAL: Detected lcore 35 as core 13 on socket 0
- EAL: Detected lcore 36 as core 0 on socket 1
- EAL: Detected lcore 37 as core 1 on socket 1
- EAL: Detected lcore 38 as core 2 on socket 1
- EAL: Detected lcore 39 as core 3 on socket 1
- EAL: Detected lcore 40 as core 4 on socket 1
- EAL: Detected lcore 41 as core 5 on socket 1
- EAL: Detected lcore 42 as core 8 on socket 1
- EAL: Detected lcore 43 as core 9 on socket 1
- EAL: Detected lcore 44 as core 10 on socket 1
- EAL: Detected lcore 45 as core 11 on socket 1
- EAL: Detected lcore 46 as core 12 on socket 1
- EAL: Detected lcore 47 as core 13 on socket 1
- EAL: Support maximum 128 logical core(s) by configuration.
- EAL: Detected 48 lcore(s)
- EAL: Setting up physically contiguous memory…
- EAL: Ask a virtual area of 0x80000000 bytes
- EAL: Virtual area found at 0x7f38c0000000 (size = 0x80000000)
- EAL: Ask a virtual area of 0x80000000 bytes
- EAL: Virtual area found at 0x7f3800000000 (size = 0x80000000)
- EAL: Requesting 1 pages of size 1024MB from socket 0
- EAL: Requesting 1 pages of size 1024MB from socket 1
- EAL: TSC frequency is ~2494222 KHz
- EAL: Master lcore 0 is ready (tid=eca398c0;cpuset=[0])
- EAL: lcore 6 is ready (tid=e7833700;cpuset=[6])
- EAL: lcore 7 is ready (tid=e7032700;cpuset=[7])
- EAL: lcore 8 is ready (tid=e6831700;cpuset=[8])
- EAL: lcore 4 is ready (tid=e8835700;cpuset=[4])
- EAL: lcore 1 is ready (tid=ea038700;cpuset=[1])
- EAL: lcore 9 is ready (tid=e6030700;cpuset=[9])
- EAL: lcore 3 is ready (tid=e9036700;cpuset=[3])
- EAL: lcore 2 is ready (tid=e9837700;cpuset=[2])
- EAL: lcore 13 is ready (tid=e402c700;cpuset=[13])
- EAL: lcore 10 is ready (tid=e582f700;cpuset=[10])
- EAL: lcore 12 is ready (tid=e482d700;cpuset=[12])
- EAL: lcore 11 is ready (tid=e502e700;cpuset=[11])
- EAL: lcore 5 is ready (tid=e8034700;cpuset=[5])
- EAL: lcore 20 is ready (tid=e0825700;cpuset=[20])
- EAL: lcore 19 is ready (tid=e1026700;cpuset=[19])
- EAL: lcore 18 is ready (tid=e1827700;cpuset=[18])
- EAL: lcore 21 is ready (tid=bbfff700;cpuset=[21])
- EAL: lcore 22 is ready (tid=bb7fe700;cpuset=[22])
- EAL: lcore 14 is ready (tid=e382b700;cpuset=[14])
- EAL: lcore 17 is ready (tid=e2028700;cpuset=[17])
- EAL: lcore 23 is ready (tid=baffd700;cpuset=[23])
- EAL: lcore 15 is ready (tid=e302a700;cpuset=[15])
- EAL: lcore 16 is ready (tid=e2829700;cpuset=[16])
continued…
- Tx Count/% Rate : Forever / 100% Forever / 100%
- PktSize/Tx Burst: 64 / 32 64 / 32
- Src/Dest Port : 1234 / 5678 1234 / 5678
- Pkt Type:VLAN ID: IPv4 / UDP:0001 IPv4 / TCP:0001
- Dst IP Address : 10.1.72.17 192.168.3.1
- Src IP Address : 10.1.72.154/24 192.168.2.1/24
- Dst MAC Address : 00:23:e9:64:c0:03 00:00:00:00:00:00
- Src MAC Address 00:02:c9:31:0c:30 00:02:c9:31:82:00
Have I hit a hardware limitation?
Any pointers would be appreciated.
Hi,
Already sent my answer to the dpdk mailing list, but also adding it here if anyone else needs it.
RSS on ConnectX-3 cards is working, but doesn’t improve the Maximum rate of the NIC, it helps for real application to spread the traffic among different cores.
Therefore with benchmark application you will see degradation with RSS, but with real application the performance should be better with RSS than without.
ConnectX-4 doesn’t have this limitation and we suggest using it instead of ConnectX-3
Best Regards,
Olga