Thank you for posting your question on the Mellanox Community.
When using more than 32 queues on NIC Rx, the probability for WQE miss on the Rx buffer increases. In answer to your question this would also apply to the ConnectX-6
To determine if the the performance decrease is due to hardware or software you should check the out_of_buffer counter.
This counter counts the number of times the NIC wanted to scatter packet but there was no receive WQE. When it is ~0 it means the SW is not the bottleneck. You can find more information on counters here:
This behavior can be seen with lesser amount of queues (up to 32) if the system is not tuned according to the benchmark reports which can be found on the DPDK website. Here is the report for DPDK 20.11:
For best performance please test using the settings used in the report.
Another thing to note is that as of DPDK 18.05 and Mellanox OFED 4.3 support has been added for stride RQ. Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffers per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. MPRQ can improve throughput for small-packet traffic. You can test using this feature for better performance by setting the parameter mprq_en=1.
For more information on this parameter please see section 32.5.3. on this page:
You can also potentially further improve performance by improving CQE compression ratio using the following commands:
sudo mcra mlx5_0 0x815e0.0 0xcff0f3ff
sudo mcra mlx5_0 0x81600.0 0xcff0f3ff
sudo mcra mlx5_0 0x815e8.31 0
sudo mcra mlx5_0 0x81608.31 0
In the above commands mlx5_0 is used as an example you can get your actual RDMA ports by using the commands:
mst status -v
These settings are active unless the machine is rebooted, please make sure you have MFT installed (this is installed with the Mellanox OFED) and the CQE compression mode is set to AGGRESSIVE. You can set this with the command mlxconfig -d <PCIe_address> s CQE_COMPRESSION=1
For general tuning recommendations with our adapters please see the following tuning guide:
Further analysis of this would require engineering investigation. If you wish to further pursue this please contact support with a valid support contract through the Mellanox support portal found here:
Thanks and regards,
~Mellanox Technical Support