RSS not working on Mellanox ConnectX-3 NIC

Hi,

I asked this on the DPDK users mailing list too but this may be a better forum for it.

I have a pair of Mellanox MCX354A-FCBT NICs and I’m having trouble scaling up RX performance. It appears that RSS is not working and RX speed is limited by a single queue.

According to the documentation RSS is supported on the mlx4 driver, and debugging the eth dev initialization code I can see the driver setting up RSS apparently with success. I can generate 34Mpps from one NIC using 8 queues, but I can only ever receive at 20Mpps on the other NIC, no matter how many queues I use.

The generated packets have randomized source/destination IP addresses and source/destination UDP ports, so they should hash to different RX queues.

The NICs are connected directly to each other with a DAC cable. They are on different NUMA nodes and I’m placing TX/RX lcores on the appropriate socket for each NIC. It doesn’t matter which NIC I use as the sender, the results are exactly the same. I have tried both pktgen and my own code but didn’t see any difference.

The server is a 2x 12-core Intel E5-2680 v3 2.5GHz. The Mellanox NICs are flashed with the latest firmware and I’m using MLNX_OFED 3.3. I’m using the MLNX_DPDK 2.2 distribution, but I also tried the standard DPDK v16.04 and the result was the same.

Here’s the output of ibstat:

  1. CA ‘mlx4_0’
  2. CA type: MT4099
  3. Number of ports: 2
  4. Firmware version: 2.36.5000
  5. Hardware version: 1
  6. Node GUID: 0x0002c90300310c30
  7. System image GUID: 0x0002c90300310c33
  8. Port 1:
  9. State: Active
  10. Physical state: LinkUp
  11. Rate: 56
  12. Base lid: 0
  13. LMC: 0
  14. SM lid: 0
  15. Capability mask: 0x0c010000
  16. Port GUID: 0x0202c9fffe310c30
  17. Link layer: Ethernet
  18. Port 2:
  19. State: Active
  20. Physical state: LinkUp
  21. Rate: 56
  22. Base lid: 0
  23. LMC: 0
  24. SM lid: 0
  25. Capability mask: 0x0c010000
  26. Port GUID: 0x0202c9fffe310c31
  27. Link layer: Ethernet
  28. CA ‘mlx4_1’
  29. CA type: MT4099
  30. Number of ports: 2
  31. Firmware version: 2.36.5000
  32. Hardware version: 1
  33. Node GUID: 0x0002c90300318200
  34. System image GUID: 0x0002c90300318203
  35. Port 1:
  36. State: Active
  37. Physical state: LinkUp
  38. Rate: 56
  39. Base lid: 0
  40. LMC: 0
  41. SM lid: 0
  42. Capability mask: 0x0c010000
  43. Port GUID: 0x0202c9fffe318200
  44. Link layer: Ethernet
  45. Port 2:
  46. State: Active
  47. Physical state: LinkUp
  48. Rate: 56
  49. Base lid: 0
  50. LMC: 0
  51. SM lid: 0
  52. Capability mask: 0x0c010000
  53. Port GUID: 0x0202c9fffe318201
  54. Link layer: Ethernet

continued…

Below are the pktgen results. Note that the first NIC is 0000:03:00.0 and is assigned ports 0-1, and the second NIC is 0000:a1:00.0 and is assigned ports 2-3. I’m testing TX on port 0 and RX on port 2, which are connected directly. Random packets are generated by using the pktgen script found here.

  1. $ app/pktgen -c ffffff -n 4 -w 0000:03:00.0 -w 0000:a1:00.0 --socket-mem=1024,1024 – -N -T -P -m “[0-7].0,[12-19].2”
  2. Copyright (c) <2010-2016>, Intel Corporation. All rights reserved. Powered by Intel® DPDK
  3. EAL: Detected lcore 0 as core 0 on socket 0
  4. EAL: Detected lcore 1 as core 1 on socket 0
  5. EAL: Detected lcore 2 as core 2 on socket 0
  6. EAL: Detected lcore 3 as core 3 on socket 0
  7. EAL: Detected lcore 4 as core 4 on socket 0
  8. EAL: Detected lcore 5 as core 5 on socket 0
  9. EAL: Detected lcore 6 as core 8 on socket 0
  10. EAL: Detected lcore 7 as core 9 on socket 0
  11. EAL: Detected lcore 8 as core 10 on socket 0
  12. EAL: Detected lcore 9 as core 11 on socket 0
  13. EAL: Detected lcore 10 as core 12 on socket 0
  14. EAL: Detected lcore 11 as core 13 on socket 0
  15. EAL: Detected lcore 12 as core 0 on socket 1
  16. EAL: Detected lcore 13 as core 1 on socket 1
  17. EAL: Detected lcore 14 as core 2 on socket 1
  18. EAL: Detected lcore 15 as core 3 on socket 1
  19. EAL: Detected lcore 16 as core 4 on socket 1
  20. EAL: Detected lcore 17 as core 5 on socket 1
  21. EAL: Detected lcore 18 as core 8 on socket 1
  22. EAL: Detected lcore 19 as core 9 on socket 1
  23. EAL: Detected lcore 20 as core 10 on socket 1
  24. EAL: Detected lcore 21 as core 11 on socket 1
  25. EAL: Detected lcore 22 as core 12 on socket 1
  26. EAL: Detected lcore 23 as core 13 on socket 1
  27. EAL: Detected lcore 24 as core 0 on socket 0
  28. EAL: Detected lcore 25 as core 1 on socket 0
  29. EAL: Detected lcore 26 as core 2 on socket 0
  30. EAL: Detected lcore 27 as core 3 on socket 0
  31. EAL: Detected lcore 28 as core 4 on socket 0
  32. EAL: Detected lcore 29 as core 5 on socket 0
  33. EAL: Detected lcore 30 as core 8 on socket 0
  34. EAL: Detected lcore 31 as core 9 on socket 0
  35. EAL: Detected lcore 32 as core 10 on socket 0
  36. EAL: Detected lcore 33 as core 11 on socket 0
  37. EAL: Detected lcore 34 as core 12 on socket 0
  38. EAL: Detected lcore 35 as core 13 on socket 0
  39. EAL: Detected lcore 36 as core 0 on socket 1
  40. EAL: Detected lcore 37 as core 1 on socket 1
  41. EAL: Detected lcore 38 as core 2 on socket 1
  42. EAL: Detected lcore 39 as core 3 on socket 1
  43. EAL: Detected lcore 40 as core 4 on socket 1
  44. EAL: Detected lcore 41 as core 5 on socket 1
  45. EAL: Detected lcore 42 as core 8 on socket 1
  46. EAL: Detected lcore 43 as core 9 on socket 1
  47. EAL: Detected lcore 44 as core 10 on socket 1
  48. EAL: Detected lcore 45 as core 11 on socket 1
  49. EAL: Detected lcore 46 as core 12 on socket 1
  50. EAL: Detected lcore 47 as core 13 on socket 1
  51. EAL: Support maximum 128 logical core(s) by configuration.
  52. EAL: Detected 48 lcore(s)
  53. EAL: Setting up physically contiguous memory…
  54. EAL: Ask a virtual area of 0x80000000 bytes
  55. EAL: Virtual area found at 0x7f38c0000000 (size = 0x80000000)
  56. EAL: Ask a virtual area of 0x80000000 bytes
  57. EAL: Virtual area found at 0x7f3800000000 (size = 0x80000000)
  58. EAL: Requesting 1 pages of size 1024MB from socket 0
  59. EAL: Requesting 1 pages of size 1024MB from socket 1
  60. EAL: TSC frequency is ~2494222 KHz
  61. EAL: Master lcore 0 is ready (tid=eca398c0;cpuset=[0])
  62. EAL: lcore 6 is ready (tid=e7833700;cpuset=[6])
  63. EAL: lcore 7 is ready (tid=e7032700;cpuset=[7])
  64. EAL: lcore 8 is ready (tid=e6831700;cpuset=[8])
  65. EAL: lcore 4 is ready (tid=e8835700;cpuset=[4])
  66. EAL: lcore 1 is ready (tid=ea038700;cpuset=[1])
  67. EAL: lcore 9 is ready (tid=e6030700;cpuset=[9])
  68. EAL: lcore 3 is ready (tid=e9036700;cpuset=[3])
  69. EAL: lcore 2 is ready (tid=e9837700;cpuset=[2])
  70. EAL: lcore 13 is ready (tid=e402c700;cpuset=[13])
  71. EAL: lcore 10 is ready (tid=e582f700;cpuset=[10])
  72. EAL: lcore 12 is ready (tid=e482d700;cpuset=[12])
  73. EAL: lcore 11 is ready (tid=e502e700;cpuset=[11])
  74. EAL: lcore 5 is ready (tid=e8034700;cpuset=[5])
  75. EAL: lcore 20 is ready (tid=e0825700;cpuset=[20])
  76. EAL: lcore 19 is ready (tid=e1026700;cpuset=[19])
  77. EAL: lcore 18 is ready (tid=e1827700;cpuset=[18])
  78. EAL: lcore 21 is ready (tid=bbfff700;cpuset=[21])
  79. EAL: lcore 22 is ready (tid=bb7fe700;cpuset=[22])
  80. EAL: lcore 14 is ready (tid=e382b700;cpuset=[14])
  81. EAL: lcore 17 is ready (tid=e2028700;cpuset=[17])
  82. EAL: lcore 23 is ready (tid=baffd700;cpuset=[23])
  83. EAL: lcore 15 is ready (tid=e302a700;cpuset=[15])
  84. EAL: lcore 16 is ready (tid=e2829700;cpuset=[16])

continued…

  1. EAL: PCI device 0000:03:00.0 on NUMA socket 0
  2. EAL: probe driver: 15b3:1003 librte_pmd_mlx4
  3. PMD: librte_pmd_mlx4: PCI information matches, using device “mlx4_0” (VF: false)
  4. PMD: librte_pmd_mlx4: 2 port(s) detected
  5. PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:31:0c:30
  6. PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:31:0c:31
  7. EAL: PCI device 0000:a1:00.0 on NUMA socket 1
  8. EAL: probe driver: 15b3:1003 librte_pmd_mlx4
  9. PMD: librte_pmd_mlx4: PCI information matches, using device “mlx4_1” (VF: false)
  10. PMD: librte_pmd_mlx4: 2 port(s) detected
  11. PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:31:82:00
  12. PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:31:82:01
  13. [0-7].0 lcores: RX( 0 1 2 3 4 5 6 7 )TX( 0 1 2 3 4 5 6 7 ) ports: RX( 0 )TX( 0 )
  14. [12-19].2 lcores: RX( 12 13 14 15 16 17 18 19 )TX( 12 13 14 15 16 17 18 19 ) ports: RX( 2 )TX( 2 )
  15. Copyright (c) <2010-2016>, Intel Corporation. All rights reserved.
  16. Pktgen created by: Keith Wiles – >>> Powered by Intel® DPDK <<<
  17. Lua 5.3.2 Copyright (C) 1994-2015 Lua.org, PUC-Rio
  18. Packet Burst 32, RX Desc 512, TX Desc 512, mbufs/port 4096, mbuf cache 512

  19. === port to lcore mapping table (# lcores 24) ===
  20. lcore: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  21. port 0: D: T 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 = 8: 8
  22. port 2: D: T 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 0: 0 0: 0 0: 0 0: 0 = 8: 8
  23. Total : 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 0: 0 0: 0 0: 0 0: 0 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 0: 0 0: 0 0: 0 0: 0
  24. Display and Timer on lcore 0, rx:tx counts per port/lcore
  25. Configuring 4 ports, MBUF Size 1920, MBUF Cache Size 512
  26. Lcore:
  27. 0, RX-TX
  28. RX( 1): ( 0: 0)
  29. TX( 1): ( 0: 0)
  30. 1, RX-TX
  31. RX( 1): ( 0: 1)
  32. TX( 1): ( 0: 1)
  33. 2, RX-TX
  34. RX( 1): ( 0: 2)
  35. TX( 1): ( 0: 2)
  36. 3, RX-TX
  37. RX( 1): ( 0: 3)
  38. TX( 1): ( 0: 3)
  39. 4, RX-TX
  40. RX( 1): ( 0: 4)
  41. TX( 1): ( 0: 4)
  42. 5, RX-TX
  43. RX( 1): ( 0: 5)
  44. TX( 1): ( 0: 5)
  45. 6, RX-TX
  46. RX( 1): ( 0: 6)
  47. TX( 1): ( 0: 6)
  48. 7, RX-TX
  49. RX( 1): ( 0: 7)
  50. TX( 1): ( 0: 7)
  51. 12, RX-TX
  52. RX( 1): ( 2: 0)
  53. TX( 1): ( 2: 0)
  54. 13, RX-TX
  55. RX( 1): ( 2: 1)
  56. TX( 1): ( 2: 1)
  57. 14, RX-TX

continued…

  1. Tx Count/% Rate : Forever / 100% Forever / 100%
  2. PktSize/Tx Burst: 64 / 32 64 / 32
  3. Src/Dest Port : 1234 / 5678 1234 / 5678
  4. Pkt Type:VLAN ID: IPv4 / UDP:0001 IPv4 / TCP:0001
  5. Dst IP Address : 10.1.72.17 192.168.3.1
  6. Src IP Address : 10.1.72.154/24 192.168.2.1/24
  7. Dst MAC Address : 00:23:e9:64:c0:03 00:00:00:00:00:00
  8. Src MAC Address 00:02:c9:31:0c:30 00:02:c9:31:82:00

Have I hit a hardware limitation?

Any pointers would be appreciated.

continued…

  1. RX( 1): ( 2: 2)
  2. TX( 1): ( 2: 2)
  3. 15, RX-TX
  4. RX( 1): ( 2: 3)
  5. TX( 1): ( 2: 3)
  6. 16, RX-TX
  7. RX( 1): ( 2: 4)
  8. TX( 1): ( 2: 4)
  9. 17, RX-TX
  10. RX( 1): ( 2: 5)
  11. TX( 1): ( 2: 5)
  12. 18, RX-TX
  13. RX( 1): ( 2: 6)
  14. TX( 1): ( 2: 6)
  15. 19, RX-TX
  16. RX( 1): ( 2: 7)
  17. TX( 1): ( 2: 7)
  18. Port :
  19. 0, nb_lcores 8, private 0x8f09f0, lcores: 0 1 2 3 4 5 6 7
  20. 2, nb_lcores 8, private 0x8f5270, lcores: 12 13 14 15 16 17 18 19
  21. ** Dev Info (librte_pmd_mlx4:17) **
  22. max_vfs : 0 min_rx_bufsize : 32 max_rx_pktlen : 65536 max_rx_queues :65408 max_tx_queues:65408
  23. max_mac_addrs : 127 max_hash_mac_addrs: 0 max_vmdq_pools: 0
  24. rx_offload_capa: 0 tx_offload_capa : 0 reta_size : 0 flow_type_rss_offloads:0000000000000000
  25. vmdq_queue_base: 0 vmdq_queue_num : 0 vmdq_pool_base: 0
  26. ** RX Conf **
  27. pthreash : 0 hthresh : 0 wthresh : 0
  28. Free Thresh : 0 Drop Enable : 0 Deferred Start : 0
  29. ** TX Conf **
  30. pthreash : 0 hthresh : 0 wthresh : 0
  31. Free Thresh : 0 RS Thresh : 0 Deferred Start : 0 TXQ Flags:00000000
  32. PMD: librte_pmd_mlx4: 0x94b7e0: TX queues number update: 0 → 8
  33. PMD: librte_pmd_mlx4: 0x94b7e0: RX queues number update: 0 → 8
  34. Initialize Port 0 – TxQ 8, RxQ 8, Src MAC 00:02:c9:31:0c:30
  35. Create: Default RX 0:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  36. Create: Default RX 0:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  37. Create: Default RX 0:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  38. Create: Default RX 0:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  39. Create: Default RX 0:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  40. Create: Default RX 0:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  41. Create: Default RX 0:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  42. Create: Default RX 0:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  43. Create: Default TX 0:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  44. Create: Range TX 0:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  45. Create: Sequence TX 0:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  46. Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  47. Create: Default TX 0:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176

continued…

  1. Create: Range TX 0:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  2. Create: Sequence TX 0:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  3. Create: Special TX 0:1 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  4. Create: Default TX 0:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  5. Create: Range TX 0:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  6. Create: Sequence TX 0:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  7. Create: Special TX 0:2 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  8. Create: Default TX 0:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  9. Create: Range TX 0:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  10. Create: Sequence TX 0:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  11. Create: Special TX 0:3 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  12. Create: Default TX 0:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  13. Create: Range TX 0:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  14. Create: Sequence TX 0:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  15. Create: Special TX 0:4 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  16. Create: Default TX 0:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  17. Create: Range TX 0:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  18. Create: Sequence TX 0:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  19. Create: Special TX 0:5 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  20. Create: Default TX 0:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  21. Create: Range TX 0:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  22. Create: Sequence TX 0:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  23. Create: Special TX 0:6 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  24. Create: Default TX 0:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  25. Create: Range TX 0:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  26. Create: Sequence TX 0:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  27. Create: Special TX 0:7 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  28. Port memory used = 324936 KB
  29. ** Dev Info (librte_pmd_mlx4:19) **
  30. max_vfs : 0 min_rx_bufsize : 32 max_rx_pktlen : 65536 max_rx_queues :65408 max_tx_queues:65408
  31. max_mac_addrs : 127 max_hash_mac_addrs: 0 max_vmdq_pools: 0
  32. rx_offload_capa: 0 tx_offload_capa : 0 reta_size : 0 flow_type_rss_offloads:0000000000000000

continued…

  1. vmdq_queue_base: 0 vmdq_queue_num : 0 vmdq_pool_base: 0
  2. ** RX Conf **
  3. pthreash : 0 hthresh : 0 wthresh : 0
  4. Free Thresh : 0 Drop Enable : 0 Deferred Start : 0
  5. ** TX Conf **
  6. pthreash : 0 hthresh : 0 wthresh : 0
  7. Free Thresh : 0 RS Thresh : 0 Deferred Start : 0 TXQ Flags:00000000
  8. PMD: librte_pmd_mlx4: 0x953870: TX queues number update: 0 → 8
  9. PMD: librte_pmd_mlx4: 0x953870: RX queues number update: 0 → 8
  10. Initialize Port 2 – TxQ 8, RxQ 8, Src MAC 00:02:c9:31:82:00
  11. Create: Default RX 2:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  12. Create: Default RX 2:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  13. Create: Default RX 2:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  14. Create: Default RX 2:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  15. Create: Default RX 2:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  16. Create: Default RX 2:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  17. Create: Default RX 2:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  18. Create: Default RX 2:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  19. Create: Default TX 2:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  20. Create: Range TX 2:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  21. Create: Sequence TX 2:0 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  22. Create: Special TX 2:0 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  23. Create: Default TX 2:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  24. Create: Range TX 2:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  25. Create: Sequence TX 2:1 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  26. Create: Special TX 2:1 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  27. Create: Default TX 2:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  28. Create: Range TX 2:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  29. Create: Sequence TX 2:2 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  30. Create: Special TX 2:2 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  31. Create: Default TX 2:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  32. Create: Range TX 2:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  33. Create: Sequence TX 2:3 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  34. Create: Special TX 2:3 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  35. Create: Default TX 2:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  36. Create: Range TX 2:4 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  37. Create: Sequence TX 2:5 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176

continued…

  1. Create: Special TX 2:5 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  2. Create: Default TX 2:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  3. Create: Range TX 2:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  4. Create: Sequence TX 2:6 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  5. Create: Special TX 2:6 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  6. Create: Default TX 2:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  7. Create: Range TX 2:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  8. Create: Sequence TX 2:7 - Memory used (MBUFs 4096 x (size 1920 + Hdr 128)) + 1581248 = 9737 KB headroom 128 2176
  9. Create: Special TX 2:7 - Memory used (MBUFs 64 x (size 1920 + Hdr 128)) + 1581248 = 1673 KB headroom 128 2176
  10. Port memory used = 324936 KB
  11. Total memory used = 649871 KB
  12. Port 0: Link Up - speed 56000 Mbps - full-duplex
  13. Port 2: Link Up - speed 56000 Mbps - full-duplex
  14. === Display processing on lcore 0
  15. WARNING: Nothing to do on lcore 8: exiting
  16. WARNING: Nothing to do on lcore 9: exiting
  17. WARNING: Nothing to do on lcore 10: exiting
  18. WARNING: Nothing to do on lcore 11: exiting
  19. WARNING: Nothing to do on lcore 20: exiting
  20. WARNING: Nothing to do on lcore 21: exiting
  21. WARNING: Nothing to do on lcore 22: exiting
  22. WARNING: Nothing to do on lcore 23: exiting
  23. === RX/TX processing lcore 1 rxcnt 1 txcnt 1 port/qid, 0/1
  24. === RX/TX processing lcore 2 rxcnt 1 txcnt 1 port/qid, 0/2
  25. === RX/TX processing lcore 3 rxcnt 1 txcnt 1 port/qid, 0/3
  26. === RX/TX processing lcore 4 rxcnt 1 txcnt 1 port/qid, 0/4
  27. === RX/TX processing lcore 5 rxcnt 1 txcnt 1 port/qid, 0/5
  28. === RX/TX processing lcore 6 rxcnt 1 txcnt 1 port/qid, 0/6
  29. === RX/TX processing lcore 7 rxcnt 1 txcnt 1 port/qid, 0/7
  30. === RX/TX processing lcore 12 rxcnt 1 txcnt 1 port/qid, 2/0
  31. === RX/TX processing lcore 13 rxcnt 1 txcnt 1 port/qid, 2/1
  32. === RX/TX processing lcore 14 rxcnt 1 txcnt 1 port/qid, 2/2
  33. === RX/TX processing lcore 15 rxcnt 1 txcnt 1 port/qid, 2/3
  34. === RX/TX processing lcore 16 rxcnt 1 txcnt 1 port/qid, 2/4
  35. === RX/TX processing lcore 17 rxcnt 1 txcnt 1 port/qid, 2/5
  36. === RX/TX processing lcore 18 rxcnt 1 txcnt 1 port/qid, 2/6
  37. === RX/TX processing lcore 19 rxcnt 1 txcnt 1 port/qid, 2/7
  38. Pktgen > load random.txt
  39. geometry 132x44

continued…

  1. mac_from_arp disable
  2. set 0 count 0
  3. set 0 size 64
  4. set 0 rate 100
  5. set 0 burst 32
  6. set 0 sport 1234
  7. set 0 dport 5678
  8. set 0 prime 1
  9. type ipv4 0
  10. range.proto 0 udp
  11. proto udp 0
  12. set ip dst 0 10.1.72.17
  13. set ip src 0 10.1.72.154/24
  14. set mac 0 00:23:e9:64:c0:03
  15. vlanid 0 1
  16. pattern 0 abc
  17. latency 0 disable
  18. mpls 0 disable
  19. mpls_entry 0 0
  20. qinq 0 disable
  21. qinqids 0 0 0
  22. gre 0 disable
  23. gre_eth 0 disable
  24. gre_key 0 0
  25. icmp.echo 0 disable
  26. pcap 0 disable
  27. range 0 enable
  28. process 0 disable
  29. capture 0 disable
  30. rxtap 0 disable
  31. txtap 0 disable
  32. vlan 0 disable
  33. src.mac start 0 00:50:56:86:10:76
  34. src.mac min 0 00:00:00:00:00:00
  35. src.mac max 0 00:00:00:00:00:00
  36. src.mac inc 0 00:00:00:00:00:00
  37. dst.mac start 0 00:23:e9:64:c0:03
  38. dst.mac min 0 00:00:00:00:00:00
  39. dst.mac max 0 00:00:00:00:00:00
  40. dst.mac inc 0 00:00:00:00:00:00
  41. src.ip start 0 10.1.72.154
  42. src.ip min 0 10.1.72.154
  43. src.ip max 0 10.1.72.254
  44. src.ip inc 0 0.0.0.1
  45. dst.ip start 0 10.1.72.17
  46. dst.ip min 0 10.1.72.17
  47. dst.ip max 0 10.1.72.17
  48. dst.ip inc 0 0
  49. src.port start 0 1025
  50. src.port min 0 1025
  51. src.port max 0 65512
  52. src.port inc 0 1
  53. dst.port start 0 0
  54. dst.port min 0 0
  55. dst.port max 0 254
  56. dst.port inc 0 1
  57. vlan.id start 0 1
  58. vlan.id min 0 1
  59. vlan.id max 0 4095
  60. vlan.id inc 0 0
  61. pkt.size start 0 64
  62. pkt.size min 0 64
  63. pkt.size max 0 1518
  64. pkt.size inc 0 0
  65. set 0 seqCnt 0
  66. Pktgen > start 0

continued…

  1. Flags:Port : P-----R--------:0 P--------------:2
  2. Link State : ----
  3. Pkts/s Max/Rx : 0/0 19839945/19839945
  4. Max/Tx : 34199936/34135552 34199936/34135552
  5. MBits/s Rx/Tx : 0/21846 12697/21846
  6. Broadcast : 0 0
  7. Multicast : 0 0
  8. 64 Bytes : 0 78156990
  9. 65-127 : 0 0
  10. 128-255 : 0 0
  11. 256-511 : 0 0
  12. 512-1023 : 0 0
  13. 1024-1518 : 0 0
  14. Runts/Jumbos : 0/0 0/0
  15. Errors Rx/Tx : 0/0 0/0
  16. Total Rx Pkts : 0 368764259
  17. Tx Pkts : 669245499 0
  18. Rx MBs : 0 236009
  19. Tx MBs : 428317 0
  20. ARP/ICMP Pkts : 0/0 0/0
  21. :
  22. Pattern Type : abcd… abcd…

Hi,

Already sent my answer to the dpdk mailing list, but also adding it here if anyone else needs it.

RSS on ConnectX-3 cards is working, but doesn’t improve the Maximum rate of the NIC, it helps for real application to spread the traffic among different cores.

Therefore with benchmark application you will see degradation with RSS, but with real application the performance should be better with RSS than without.

ConnectX-4 doesn’t have this limitation and we suggest using it instead of ConnectX-3

Best Regards,

Olga