Maintaining MGBE TX Rate Without Large Inter-Packet Gaps

shepard.siegel · March 28, 2026, 1:59pm

We are running Thor with the four MGBE ports in 10GbE mode, streaming data to a link partner. In a previous thread (Jetson Thor - MGBE TX Pause Frames), we resolved TX Pause Frame support via a device tree edit — thanks again to @whitesscott and @WayneWWW. With pause frames working, we never over-drive the link partner. That problem is solved.

The next issue we’re chasing is large inter-packet gaps on the TX path. We are streaming at approximately 8 Gbps per port. Our packets are ~8 KB, so nominally they depart every ~8 µs — about 125K packets per second. Most of the time they do, often faster. However, we occasionally observe inter-packet gaps of up to 6 ms. Our link partner has enough buffering to tolerate about 1 ms of gap, so these 6 ms gaps cause data loss on the receiving end.

To be clear: the source data is available well ahead of time in a DRAM buffer. There is no producer-side latency race. The gaps appear to be introduced somewhere in the TX path between our userspace application and the wire — likely Linux scheduling, the kernel network stack, or the MGBE driver/DMA engine.

We’d appreciate any guidance on:

What tools or instrumentation are available on Thor to diagnose where these gaps originate? (e.g., ethtool stats, MGBE driver debug, tracing, etc.)
Are there known tuning knobs — sysctl settings, IRQ affinity, NAPI parameters, TX coalescing, ring buffer sizes — that can reduce worst-case TX latency on the MGBE?
Is there a kernel-bypass or zero-copy TX path available for the MGBE on Thor? (AF_XDP, DPDK, Holoscan, or similar?)
Any known issues with Linux scheduler interaction and the MGBE driver that could explain multi-millisecond stalls?

We are running jetson-r38.4.0. Any suggestions or pointers would be greatly appreciated.

Thank you in advance! Shep

whitesscott · March 29, 2026, 1:36am

Something here might be helpful.

# Disable flow control entirely on the Jetson to see if the gaps disappear. You may need to disable it on the switch/receiver too.
ethtool -A ethX rx off tx off

# If the TX ring buffer is too small, or if interrupt coalescing is holding off TX completion interrupts for too long, 
the MGBE driver will run out of TX descriptors.

# find ring buffer values
ethtool -g mgbe0_0

# Increase the TX ring buffer to its maximum supported size:
ethtool -G mgbe0_0 tx <max_value>

# Tune interrupt coalescing. Disable "Adaptive TX" and set a hard, low limit for TX usecs and frames.
ethtool -C mgbe0_0 adaptive-tx off tx-usecs 50 tx-frames 32

# Lock the Jetson clocks to maximum performance to eliminate frequency scaling and deep idle states:
sudo jetson_clocks

# Try disabling offloads to force the stack to push discrete packets.
ethtool -K mgbe0_0 tso off gso off sg off

Ftrace

Prepare the ftrace environment

sudo su
cd /sys/kernel/debug/tracing

# Reset and set tracer
echo 0 > tracing_on
echo > trace
echo function_graph > current_tracer

# Clear old filters
echo > set_ftrace_filter

# Add core network TX boundaries and queue controls
echo 'dev_hard_start_xmit' >> set_ftrace_filter
echo 'netif_tx_stop_queue' >> set_ftrace_filter
echo 'netif_tx_wake_queue' >> set_ftrace_filter

# Add everything in the nvethernet module
echo ':mod:nvethernet' >> set_ftrace_filter

# Set the 5 ms trap (5000 microseconds)
echo 5000 > tracing_thresh

# Start tracing
echo 1 > tracing_on

# Let your 8 Gbps stream run while tracing is on. Wait until your receiving link partner reports the ~6 ms gap. Immediately stop the trace to prevent the ring buffer from wrapping:

echo 0 > tracing_on

cat trace > /home/$SUDO_USER/nvethernet_stall_trace.txt

Open the trace file, one of these pattern may be shown.

A Hardware/Bus Stall

If the delay is happening at the very bottom of the stack when the driver is physically handing the packet descriptors to the MGBE hardware.
Your trace may look something like this:

# tracer: function_graph
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  4) ! 6005.12 us  |  dev_hard_start_xmit() {
  4) ! 6002.88 us  |    nve_start_xmit(); 
  4) ! 6005.50 us  |  }

Impression: The kernel is perfectly healthy, but the nve_start_xmit function (which writes to the PCIe/memory-mapped registers of the MGBE) was physically blocked from completing for 6 milliseconds.
   This is may be the hardware honoring an incoming 802.3x Pause Frame from your link partner; 
   or a severe DMA/SMMU translation stall on the Thor's memory controller.

Ring Buffer Starvation, the queue stalls

If the TX ring buffer fills up because the hardware isn't draining it fast enough (or the kernel isn't cleaning up transmitted packets fast enough), 
the driver will forcefully stop the kernel's network queue.

# tracer: function_graph
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  2)   1.22 us     |  netif_tx_stop_queue();
  ... (6 milliseconds of absolutely nothing on this CPU) ...
  2)   0.95 us     |  netif_tx_wake_queue();

Note: You might have to look at the absolute timestamps on the far left of the full trace to see the 6 ms time jump between the stop and the wake.

The driver ran out of TX descriptors. The kernel stopped sending. It took 6 ms for the hardware to finally trigger a TX Completion Interrupt (or for the CPU to wake up and process the SoftIRQ) to clean out the old packets and wake the queue.
   Interrupt coalescing may be set too high, the TX ring buffer is too small, or the CPU core handling the IRQ went to sleep (deep C-state) and took too long to wake up.

CPU Preemption / OS Jitter

If the delay happens higher up in the core network stack before it even reaches the nvethernet driver, you will see something like this:

# tracer: function_graph
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  5) ! 6150.00 us  |  dev_hard_start_xmit() {
  5)   2.15 us     |    nve_start_xmit();
  5) ! 6155.10 us  |  }

In this example the nve_start_xmit only took 2 microseconds, but the parent function dev_hard_start_xmit took over 6 ms. 
This means the CPU was preempted right in the middle of executing the network stack.
    The kernel scheduler may have yanked the CPU core away from your network thread to do something else and didn't give it back for 6 ms.

Topic		Replies	Views
Jetson Thor - MGBE TX Pause Frames Jetson Thor ethernet	11	137	March 23, 2026
Jetson AGX Thor: 25GbE Links Established but Throughput Capped at ~10 Gbps Jetson Thor nvbugs , mgbe	13	865	November 27, 2025
AGX Thor QSFP MGBE Link Instability Between Two DevKits Jetson Thor mgbe	3	24	March 29, 2026
[MGBE] 10G Network occasional TX crc errors Jetson AGX Orin board-design , mgbe	2	120	October 31, 2025
Packet Drop on Orin mgbe0/1 Jetson AGX Orin ethernet	15	1619	January 25, 2023
Missing packets when receving Jumbo Multicast packets on MGBE Jetson Thor nvbugs , mgbe	15	417	March 30, 2026
TX1 Ethernet Gigabit link topping out at ~600 Gbits/s Jetson TX1	6	3288	December 2, 2016
The information of MBGE on our customized carrier board is different from dev-kit Jetson Thor board-design , mgbe	15	276	November 6, 2025
Investigation and mitigation of 250–260 µs latency spikes over fiber port in Thor direct connection, how to decrease it Jetson Thor mgbe	3	21	March 18, 2026
Jetson Thor Mgbe config for 2.5Gbps phy Jetson Thor hw , board-design , nvbugs , level2 , mgbe	5	486	September 25, 2025

Maintaining MGBE TX Rate Without Large Inter-Packet Gaps

Related topics