rx-out-of-buffer

Hi Community,

I’d like to understand better a problem we have, which seems to be linked to the fact that DPDK’s xstats/ethtool -S shows a lot of “rx-out-of-buffer” packets. I found the performance counter document, but it does not say much about why this could happen, which buffer we’re speaking about. I quote “Number of times receive queue had no software buffers allocated for the adapter’s incoming traffic.”. As rx_nombufs (DPDK stats) is 0 I guess it does not mean that there is not enough software buffers. Are they some internal MLX buffers? What can be done to prevent that?

Thanks,

Tom

No. Even if Mellanox force-accepted two answers, they like good stats apparently.

If you have any clue, don’t hesitate to share.

My fear is that they will not disclose this info, like any internals about the NICs. There is some internal buffers somewhere that we run out of. The queues “imissed” counters queried via DPDK clearly report 0 loss, so it’s not the queues that lacks of buffers, or there is a bug in reporting if it is. Like those infinite flow table that work like magic, they probably won’t say anything…

Hi Tom,

Thanks you for posting your question on the Mellanox Community.

Based on the information provided, the following Mellanox Community document explains the ‘rx_out_of_buffer’ ethtool/xstat statistic.

You can improve the rx_out_of_buffer behavior with tuning the node and also modifying the ring-size on the adapter (ethtool -g )

Also make sure, you follow the DPDK Performance recommendations from the following link → https://doc.dpdk.org/guides/nics/mlx5.html#performance-tuning https://doc.dpdk.org/guides/nics/mlx5.html#performance-tuning

If you still experience performance issues after these recommendations, please do not hesitate to open a Mellanox Support Case, by emailing to support@mellanox.com mailto:support@mellanox.com

Thanks and regards,

~Mellanox Technical Support

Hmm. That’s not fair.

On a lighter note, looks like question-accepted-stats and mlx5-ethtool-counter-stats – both are misleading

Were you able to find answers to your query? I have the same question

Thanks, but how is it possible to have rx_out_of_buffer and all queues that have not a single “imissed” (DPDK counter that says how much packet could not be received because of a lack of buffers) in any single queues ? Something does not add up here.

We use DPDK so ethtool -C will not impact the performance as those would be overridden by DPDK. We did disable flow-control and send max read request. But my questions here is not the performance, we have a ticket with support for that, it is specifically about the rx_out_of_buffer. I do not understand how that number can increase while the rings themselves do not have any reported miss?

Thanks for the answer ! But I don’t see what you refer as “the following Mellanox Community document” ? So I kind of still don’t know what it is. If you refer to the line “Number of times receive queue had no software buffers allocated for the adapter’s incoming traffic.”, then the tuning you mention will not change the problem because the rings are never full, and the CPU is not busy. So what buffer does “rx-out-of-buffer” count if it’s not the ring buffers?

Hi Tom,

My apologies for not providing the link to the Mellanox Community document. The link is → Understanding mlx5 ethtool Counter https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

The “rx_out_of_buffer” counter from ‘ethtool -S’ indicates RX packet drops due to lack of receive buffers. The lack of receive buffers can be related to a system tuning issue or system capability.

What happens, when you turn off ‘interrupt coalescence’ on the NIC with the following command → # ethtool -C adaptive-rx off rx-usecs 0 rx-frames 0

Also make sure, you disable flow-control on the NIC and set the PCI Max Read Request to ‘4096’. Link to document → Understanding PCIe Configuration for Maximum Performance https://community.mellanox.com/s/article/understanding-pcie-configuration-for-maximum-performance

Thanks and regards,

~Mellanox Technical Support

Hey Tom,

Just checking if you and/or Mellanox were able to fix this issue ?

Q1. When you say “The lack of receive buffers can be related to a system tuning issue or system capability”, do you mean the Mellanox NIC (Mellanox Connect X5 100G EN in my case) capability or something else?

I have followed every tuning requirement Mellanox suggests, but still getting very high rx_out_of_buffer and that keeps increasing if I keep the session on…

Q2. If it helps, do you want me to open a new ticket with all the details? Please let me know. It is related to

Thanks,

Arvind

Not really. But with a more powerfull Skylake machine, the rx_out_of_buffer counter decreased and we could achieve near 100G. But, it still increases and software queues never contain more than 50 out of 4000 descriptors… So the device is congested. But according to Mellanox’s software (NEO Host) PCIe bus is not the problem. And according to the description this count is not because of some internal congestion. So the last thing in between memory and the PCIe is the CPU architecture itself.