Rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

Hi,

I’m getting rx_fifo errors and rx_dropped_errors receiving UDP packets. I have 8 applications each receiving ~8000 byte UDP packets from 7 different pieces of hardware with different IP addresses. The packet and data rate is identical for each application - totalling 440k packets/sec and 29 Gbit/sec respectively. The packets are all transmitted synchronously, at a rate of 2x8000 byte packets every 1.5 ms for each of 56 different hardware cards.

In this mode, rx_dropped and rx_fifo_errors increased at a few tens of packets per second. Attached is a dump of what ethtool shows. vma_stats shows no dropped packets. Each application is bound with numactl to NUMA node 1 (which is is where the NIC is attached). top shows each core on that node is running at < 40% CPU. The switch shows no dropped packets.

Libvma configuration as shown below. I had the same problem when not using libvma (i.e. vanilla linux kernel packet processing).

Can anyone give me some hints on where to look to reduce the number of lost packets?

Many thanks in advance,

Keith

export VMA_MTU=9000 #don’t need to set - should be intelligent but we’ll set it anyway for now

export VMA_RX_BUFS=32768 # number of buffers -each of 1xMTU. Default is 200000 = 1 GB!

export VMA_RX_WRE=4096 # number work requests

export VMA_RX_POLL=0 # Don’t waste CPU time polling. WE don’t need to

export VMA_TX_BUFS=256 # Dont need many of these, so make it smalle

export VMA_TX_WRE=32 # Don’t need to tx so make this small to save memory

export VMA_INTERNAL_THREAD_AFFINITY=15

export VMA_MEM_ALLOC_TYPE=0

export VMA_THREAD_MODE=0 # all socket processing is single threaded

export VMA_CQ_AIM_INTERRUPTS_RATE_PER_SEC=200

export VMA_CQ_KEEP_QP_FULL=0 # this does packet drops according ot the docs??

export VMA_SPEC=throughput

ban115@tethys:~$ lspci -v | grep -A 10 ellanox

84:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]

Flags: bus master, fast devsel, latency 0, IRQ 74, NUMA node 1

Memory at c9800000 (64-bit, non-prefetchable) [size=1M]

Memory at c9000000 (64-bit, prefetchable) [size=8M]

Expansion ROM at [disabled]

Capabilities:

Kernel driver in use: mlx4_core

Kernel modules: mlx4_core

ban115@tethys:~$ numactl --hardware

available: 2 nodes (0-1)

node 0 cpus: 0 2 4 6 8 10 12 14

node 0 size: 15968 MB

node 0 free: 129 MB

node 1 cpus: 1 3 5 7 9 11 13 15

node 1 size: 16114 MB

node 1 free: 2106 MB

node distances:

node 0 1

0: 10 21

1: 21 10

If you are seeing the same behaviour without VMA, why to complicate the problem? Start tuning the system and see if it helps. Adding more components will not help to troubleshoot. After tuning, I would suggest to check netstat -s/nstat and ‘netstat -unp’ to check the receive queue size.

The tuning guides are available from Mellanox site - Performance Tuning for Mellanox Adapters https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

You also might check what is the current number of send/receive queues configured on interface and try to limit it to 16

ethtool -L rx 16 tx 16

number of channels - is how many queues show be created

ring size - what is the size of the queue

Generally, you shouldn’t be changing the default as they are based on the vendor experience (any vendor), however sometimes it is better to play with these settings. For example, setting number of receive queue to the number of CPUs on the host might be not a bad idea as larger number of queue will cause to more context switches that might cause to degradation.

The same with queue size - setting it to maximum means increase amount of memory used by the queue and that might cause to page swapping, that also might cause to degradation.

Bottom line, there is no single recipe, but optimum defaults. Every change, need to be validated by running benchmarks that close to mimics behaviour of the real-time application or by application itself.

Do you still have dropped packets after changing these parameters?

I would recommend to check also RedHat network performance tuning guide if you work with TCP/UDP. For VMA is is not really applicable as VMA bypass the kernel.

Hi Alkx,

Thanks for your reply. I’ve done all the performance tuning steps from the site you recommend. I tried VMA because I was expecting someone would say “Have you tried VMA?”, also vma_stats seems to give more visibility of the various buffer sizes (and errors) than available via the kernel.

I monitor /proc/net/udp. With VMA off, it shows no drops and rarely more than a few MB in the UDP buffer (I think this equivalent to netstat -unp).

Thanks for the tip on ethtool -L. Below are my current settings. I’ll have a play with it and see if things improve. I hadn’t seen that before. I wonder why it isn’t in the tuning guides?

Also:

  • What’s the difference between the ‘rings’ (ethtool -g) and ‘channels’ (ethtool -L)?

  • Why does making the channels smaller help?

ban115@tethys:~$ /sbin/ethtool -g enp132s0

Ring parameters for enp132s0:

Pre-set maximums:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 8192

Current hardware settings:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 512

ban115@tethys:~$ /sbin/ethtool -L enp132s0

no channel parameters changed, aborting

current values: tx 8 rx 32 other 0 combined 0