How to find the maximum number of RX Queues for a NIC (ConnectX-5)?

Hi,

I am using DPDK with an MLX5 card (MCX515A-CCAT single port 100Gbe).

I seem to be hitting a wall when trying to use more than 32 RX Queues. I get good performance with 32 RX Ques, but it drops very significantly when I go over to 36 Queues.

Is there a limit of 32 RX Queues per port? I have not been able to find this in the documentation.

Is this limit configurable or there is nothing I can do?

Would I be able to go higher with ConnextX-6?

Thanks!

Hello Baptiste,

Thank you for posting your question on the Mellanox Community.

To help best answer your question, please answer the following questions:

  1. which version of DPDK are you currently using?
  2. Which version of the Mellanox OFED are you using? You can see the version you are using with the command # ofed_info -s
  3. Which firmware are you using on your adapter? To get your adapters firmware version run the following commands:

# mst start

# mst status

Then use the outputted device in the command # flint -d <mst_device> q

For example:

# flint -d /dev/mst/mt4119_pciconf0 q

Thanks and regards,

~Mellanox Technical Support

Hello Baptiste,

Thank you for posting your question on the Mellanox Community.

When using more than 32 queues on NIC Rx, the probability for WQE miss on the Rx buffer increases. In answer to your question this would also apply to the ConnectX-6

To determine if the the performance decrease is due to hardware or software you should check the out_of_buffer counter.

This counter counts the number of times the NIC wanted to scatter packet but there was no receive WQE. When it is ~0 it means the SW is not the bottleneck. You can find more information on counters here:

https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

This behavior can be seen with lesser amount of queues (up to 32) if the system is not tuned according to the benchmark reports which can be found on the DPDK website. Here is the report for DPDK 20.11:

For best performance please test using the settings used in the report.

Another thing to note is that as of DPDK 18.05 and Mellanox OFED 4.3 support has been added for stride RQ. Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffers per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. MPRQ can improve throughput for small-packet traffic. You can test using this feature for better performance by setting the parameter mprq_en=1.

For more information on this parameter please see section 32.5.3. on this page:

https://dpdk-power-docs.readthedocs.io/en/latest/nics/mlx5.html

You can also potentially further improve performance by improving CQE compression ratio using the following commands:

sudo mcra mlx5_0 0x815e0.0 0xcff0f3ff

sudo mcra mlx5_0 0x81600.0 0xcff0f3ff

sudo mcra mlx5_0 0x815e8.31 0

sudo mcra mlx5_0 0x81608.31 0

In the above commands mlx5_0 is used as an example you can get your actual RDMA ports by using the commands:

mst start

mst status -v

These settings are active unless the machine is rebooted, please make sure you have MFT installed (this is installed with the Mellanox OFED) and the CQE compression mode is set to AGGRESSIVE. You can set this with the command mlxconfig -d <PCIe_address> s CQE_COMPRESSION=1

For general tuning recommendations with our adapters please see the following tuning guide:

https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

Further analysis of this would require engineering investigation. If you wish to further pursue this please contact support with a valid support contract through the Mellanox support portal found here:

https://support.mellanox.com/s/

Thanks and regards,

~Mellanox Technical Support

HI Abigail,

Thanks a lot for your reply.

Why would the probably for a miss increase after 32 RX Queues? Could that be a significant drop? Our performance is dropping by more than 20% from 32 Queues to 33 Queues.

I have currently no access to the benchmark machine, but I will check the out_of_buffer stats next week.

We have seen that MPRQ support was added. Unfortunately, we currently need the hash result and since we have compression enabled, it seems that the hash is not fully supported with MPRQ. So, we have not done any test with MPRQ currently. I will try to run a test with our benchmark code.

I will try to tune the compression ratios. We already have aggressive compression enabled.

I will go over the different documents and see if we can find something, but I have already been through them before and it seems like we have made proper tuning. I just can’t seem to go over 88Mpps

Best regards

Baptiste

Hi Abigail,

Thanks for your answer

  1. I am using 19.11

  2. We are using in-tree drivers not OFED

  3. The firmware version is 16.20.1010

Best regards

Baptiste