两块Thor开发板用QSFP28直连,iperf3打流时断时续

硬件:两个 Thor devkit + T5000,10Gx4,用QSFP28直连

软件:L4T-R38.4.0

现象:iperf3打流,经常反复出现0 bps的情况,同时在连续0bps的时候,ping会出现答复包 积压突发 的情况。请见截图中方框部分。

问题:这个情况如何解决?

Try parallelism:

iperf3 -c <SERVER_IP> -P 16 -t 30 -O 5

Bidirectional test:
iperf3 -c <SERVER_IP> -P 8 -t 30 -O 5 --bidir

-P, --parallel  #         number of parallel client streams to run
-t 30 runs 30 seconds.
-O 5 ignores the first 5 seconds (warmup).

Another thing that might matter; what is your MTU and is it identical on all ethernet devices?

ip link show mgbe0_0
ip link show mgbe1_0
ip link show mgbe2_0
ip link show mgbe3_0

If you are using bonding.ko:

cat /proc/net/bonding/bond0
ip link show bond0

If your actual MTU is not == to your desired MTU settings, Do you need macsec? Macsec reduces mtu by 34bytes.

nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/macsec.h
/**
 * @brief MACSEC SECTAG + ICV + 2B 
          ethertype adds up to 34B
 */
#define MACSEC_TAG_ICV_LEN              34U

If you don’t need macsec you could try:

sudo tee /etc/modprobe.d/nvethernet.conf <<'EOF'
options nvethernet macsec_enable=0
EOF

Reboot. Then run the ip link show block again to see if MTU actual == desired.

  1. iperf3改用多线程之后,降低了0bps的概率,但是仍然无法避免
  2. iperf3增加了–bidir参数,仍然有0bps的情况,概率比多线程时更低一点
  3. ictrek@M114:~$ ip link show mgbe0_0
    3: mgbe0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1466 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4c:bb:47:38:05:7b brd ff:ff:ff:ff:ff:ff
    ictrek@M114:~$ ip link show mgbe1_0
    8: mgbe1_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1466 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4c:bb:47:38:05:7c brd ff:ff:ff:ff:ff:ff
    ictrek@M114:~$ ip link show mgbe2_0
    10: mgbe2_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1466 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4c:bb:47:38:05:7d brd ff:ff:ff:ff:ff:ff
    ictrek@M114:~$ ip link show mgbe3_0
    11: mgbe3_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1466 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4c:bb:47:38:05:7e brd ff:ff:ff:ff:ff:ff
  4. 没有使用bonding:~$ cat /proc/net/bonding/bond0
    ip link show bond0
    cat: /proc/net/bonding/bond0: No such file or directory
    Device “bond0” does not exist.
  5. 禁用macsec后,mtu变成1500,正好增加了34(MACSEC_TAG_ICV_LEN),这种情况下,仍会复现0bps的情况

I don’t know. These might help.

Disable Hardware Offloads:

sudo ethtool -K mgbe0_0 tso off gso off gro off lro off

Run your iperf3 test again. If the 0 bps drops disappear, you’ve found the culprit. You can then reenable them one by one to see which specific offload engine is failing.


At high speeds, the default DMA ring buffer sizes might be too small. If the CPU is momentarily busy and doesn’t service the network interrupt fast enough, the ring buffer fills up, packets are tail-dropped, and TCP stalls out.

First, check your current and maximum ring buffer sizes:
ethtool -g mgbe0_0

On my Thor it shows preset maximums: RX: 16384. So I could try up to 16384 to see if it helps 0bps

for i in {0..3}; do sudo ethtool -G mgbe${i}_0 rx 16384; done


Interrupt coalescing delays hardware interrupts so the CPU can process packets in batches. If the delay is too long, latency spikes; if it’s too short, the CPU is overwhelmed with IRQs. Check the current settings:
ethtool -c mgbe0_0

Try modifying the rx-usecs (the time to wait before generating an interrupt) to a slightly higher or lower value to see if it stabilizes the TCP stream. A good starting test is disabling adaptive RX and setting a fixed microsecond delay:

sudo ethtool -C mgbe0_0 adaptive-rx off rx-usecs 50


Try setting Thor to maximum performance profile:

sudo nvpmodel -m 0
sudo jetson_clocks


I don’t know if this could help, but you could try enabling pause frames in devicetree, as shown in the post.

https://forums.developer.nvidia.com/t/jetson-thor-mgbe-tx-pause-frames/360491/5

谢谢,我研究一下。

以下是我这边的ethtool的-g和-c的数据,pre-set RX也是16384,但“Current hardware settings”的RX确只有4096,不知道这是什么情况?Adaptive RX: n/a TX: n/a看起来都没有使能。网络这些参数并没有专门修改过,都是从源码中直接编译出来的。

$ sudo ethtool -g mgbe0_0
[sudo] password for ictrek:
Ring parameters for mgbe0_0:
Pre-set maximums:
RX: 16384
RX Mini: n/a
RX Jumbo: n/a
TX: 4096
TX push buff len: n/a
Current hardware settings:
RX: 4096
RX Mini: n/a
RX Jumbo: n/a
TX: 4096
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
RX Push: off
TX push buff len: n/a
TCP data split: n/a

$ sudo ethtool -c mgbe0_0
Coalesce parameters for mgbe0_0:
Adaptive RX: n/a TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 512
rx-frames: 64
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 256
tx-frames: 16
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a

CQE mode RX: n/a TX: n/a

tx-aggr-max-bytes: n/a
tx-aggr-max-frames: n/a
tx-aggr-time-usecs n/a