Hello,
we are building a custom carrier board with a Microchip LAN7430 Ethernet Controller and have noticed high package loss (26%) when receiving packages. The packages are not dropped by hardware.
Nvidia Jetson sending packages:
iperf3 -c 192.168.1.100 -u -b 1000M
Connecting to host 192.168.1.100, port 5201
[ 5] local 192.168.1.170 port 46940 connected to 192.168.1.100 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 112 MBytes 942 Mbits/sec 81326
[ 5] 1.00-2.00 sec 112 MBytes 940 Mbits/sec 81151
[ 5] 2.00-3.00 sec 111 MBytes 935 Mbits/sec 80677
[ 5] 3.00-4.00 sec 111 MBytes 935 Mbits/sec 80739
[ 5] 4.00-5.00 sec 111 MBytes 935 Mbits/sec 80690
[ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 81358
[ 5] 6.00-7.00 sec 112 MBytes 941 Mbits/sec 81188
[ 5] 7.00-8.00 sec 113 MBytes 950 Mbits/sec 81984
[ 5] 8.00-9.00 sec 112 MBytes 940 Mbits/sec 81121
[ 5] 9.00-10.00 sec 112 MBytes 943 Mbits/sec 81411
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec 0.000 ms 0/811645 (0%) sender
[ 5] 0.00-10.04 sec 1.09 GBytes 929 Mbits/sec 0.015 ms 6493/811641 (0.8%) receiver
iperf Done.
Nvidia Jetson receiving packages:
(Client command: iperf3 -c 192.168.1.170 -u -b 1000M)
iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.100, port 42178
[ 5] local 192.168.1.170 port 5201 connected to 192.168.1.100 port 55236
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 79.0 MBytes 663 Mbits/sec 0.019 ms 20649/77861 (27%)
[ 5] 1.00-2.00 sec 82.4 MBytes 692 Mbits/sec 0.010 ms 21750/81451 (27%)
[ 5] 2.00-3.00 sec 82.5 MBytes 692 Mbits/sec 0.019 ms 21116/80842 (26%)
[ 5] 3.00-4.00 sec 82.5 MBytes 692 Mbits/sec 0.025 ms 21471/81179 (26%)
[ 5] 4.00-5.00 sec 82.4 MBytes 691 Mbits/sec 0.031 ms 21464/81132 (26%)
[ 5] 5.00-6.00 sec 82.4 MBytes 692 Mbits/sec 0.018 ms 21344/81041 (26%)
[ 5] 6.00-7.00 sec 82.4 MBytes 691 Mbits/sec 0.032 ms 21072/80764 (26%)
[ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0.029 ms 21776/81487 (27%)
[ 5] 8.00-9.00 sec 80.9 MBytes 678 Mbits/sec 0.039 ms 20953/79518 (26%)
[ 5] 9.00-10.00 sec 82.4 MBytes 692 Mbits/sec 0.019 ms 21694/81400 (27%)
[ 5] 10.00-10.04 sec 3.33 MBytes 682 Mbits/sec 0.017 ms 942/3357 (28%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.04 sec 823 MBytes 687 Mbits/sec 0.017 ms 214231/810032 (26%) receiver
During the transfer mpstat reports a soft load of 50%.
I talked with the Microchip support team over weeks and we tried different approaches. I received the newest driver updates backported to kernel 5.15 and we adjusted the RX FIFO queue size (inside the driver) for receiving packages. None of the changes resulted in any improvements.
The theory of the support team is that the software running on CPU is not fast enough in consuming the RX FIFO contents. Interrupt, DMA handling and the presence of other PCIe devices is the problem. In theory one core should be able to handle the load. Multiple cores handling it is not implemented as described here. How can the interrupt load balancing be improved?
Other test we did:
- dropwatch -lkas during udp transfer (nothing unusual noticed)
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
8 drops at net_tx_action+0 (0xffffc485aed8a4b0) [software]
72 drops at udp_queue_rcv_one_skb+420 (0xffffc485aee85790) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
51 drops at udp_queue_rcv_one_skb+420 (0xffffc485aee85790) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
15 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
30 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
2 drops at net_tx_action+0 (0xffffc485aed8a4b0) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
6 drops at net_tx_action+0 (0xffffc485aed8a4b0) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
45 drops at netlink_broadcast_filtered+364 (0xffffc485aee184d4) [software]
- iperf with different udp client transmission rates (-b option adjusted in iperf)
Client transmission rate [Mbits/s] | Package loss Nvidia [%]
400 | 0.043
500 | 0.91
600 | 6
700 | 11
800 | 18
900 | 24
1000 | 27
- iperf with different block sizes (-l option) (-b set to 1000M)
- /proc/interrupts as shown is the difference between start and end of iperf transfer on CPU0
- iperf in 30s interval
- test shows correlation between number of MSI interrupts and packet loss caused by RX FIFO overflow
Example of raw /proc/interrupts output:
grep lan743x /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
282: 1 0 0 0 0 0 PCI-MSI 940048384 Edge lan743x
283: 57601 0 0 0 0 0 PCI-MSI 940048385 Edge lan743x
284: 558754 0 0 0 0 0 PCI-MSI 940048386 Edge lan743x
285: 0 0 0 0 0 0 PCI-MSI 940048387 Edge lan743x
286: 0 0 0 0 0 0 PCI-MSI 940048388 Edge lan743x
287: 0 0 0 0 0 0 PCI-MSI 940048389 Edge lan743x
block size 32
- /proc/interrupts difference 744419
- package loss: 61%, 4373763 of 7174918 dropped
block size 214
- /proc/interrupts difference 708314
- package loss: 61%, 4393145 of 7174918 dropped
block size 982
- /proc/interrupts difference 534999
- package loss: 41%, 1429124 of 3466840 dropped
System Information
- Custom Carrier Board
- Jetpack 6.2 (L4T 36.4.3), custom ubuntu image starting from Ubuntu base 22.04.5
- PCIe Devices: NVMe SSD, M.2 Intel Wifi Card, LAN7430 Ethernet Controller
- LAN7430 uses MSIX interrupts, device tree