Hardware: Jetson AGX Thor Dev Kit
JetPack Version: R38.2.1 (latest as of Sep 2025)
Network Setup: 4x25GbE via QSFP28 breakout cable → MikroTik CRS518 switch → Mellanox ConnectX-5 100G
Issue: All 4 ports successfully negotiate 25Gbps, but actual throughput is limited to 8-10 Gbps (peaks at 12.6 Gbps) per port instead of expected ~23-24 Gbps.
Current Status
✅ What’s Working
- Link establishment: All 4 mgbe ports show
Speed: 25000Mb/s, Link detected: yes - Hardware offloads: TSO/GSO/GRO all enabled
- IRQ distribution: Manually pinned to separate CPU cores (0-3)
- Switch configuration: MikroTik ports at 25G, FEC off, flow control off, zero errors/drops
- Cables: Mellanox MCP1600-C001 1m DAC breakout cables
❌ The Problem
- Single stream TCP: 8-10 Gbps average (peaks at 12.6 Gbps)
- Multi-stream (-P 4): 10.1 Gbps aggregate (marginal improvement)
- Multi-stream (-P 8): 6.3 Gbps aggregate (WORSE with more parallelism)
Diagnostic Data
System Information
# JetPack version
$ cat /etc/nv_tegra_release
R38 (release), REVISION: 2.1, GCID: 42061081, BOARD: generic, EABI: aarch64, DATE: Wed Sep 10 19:49:31 UTC 2025
# Kernel
$ uname -r
6.8.12-tegra
# Link status (all 4 ports identical)
$ ethtool mgbe0_0
Speed: 25000Mb/s
Duplex: Full
Link detected: yes
Supported FEC modes: Not reported
Performance Tests
# Single stream TCP (best result)
$ iperf3 -c 10.0.0.75 -t 30 -i 5
[ 5] 0.00-5.00 sec 5.06 GBytes 8.68 Gbits/sec
[ 5] 5.00-10.00 sec 7.34 GBytes 12.6 Gbits/sec # PEAK
[ 5] 10.00-15.00 sec 7.01 GBytes 12.0 Gbits/sec
[ 5] 15.00-20.00 sec 4.62 GBytes 7.93 Gbits/sec
[ 5] 0.00-30.00 sec 32.5 GBytes 9.32 Gbits/sec # AVG
# Multi-stream makes it WORSE
$ iperf3 -c 10.0.0.75 -t 15 -P 8
[SUM] 0.00-15.00 sec 11.1 GBytes 6.34 Gbits/sec # Lower than single stream!
Key Statistics
# ethtool -S mgbe0_0 | grep -E "err|drop" | grep -v ": 0"
mmc_rx_crc_error: 1
mmc_rx_packet_smd_err_cnt: 37
mgbe_payload_cs_err: 1739 # ← KNOWN ISSUE from Orin forums
# CPU load during 9 Gbps test (NOT CPU-bound!)
%Cpu(s): 0.6 us, 1.9 sy, 0.0 ni, 90.6 id, 0.6 hi, 6.3 si
# CPU is 90% IDLE, only 7% interrupt load
# IRQ distribution (properly spread after manual pinning)
$ cat /proc/interrupts | grep mgbe0_0
272: 1127517 0 0 0 mgbe0_0.vm0 # CPU0
273: 1040871 342164 0 0 mgbe0_0.vm1 # CPU1
274: 474624 0 669726 0 mgbe0_0.vm2 # CPU2
275: 174643 0 0 279539 mgbe0_0.vm3 # CPU3
What We’ve Ruled Out
✅ NOT a CPU bottleneck: CPU 90% idle, IRQs only 7%
✅ NOT interrupt saturation: IRQs distributed across 4 cores
✅ NOT the switch: Zero drops, zero errors, proper config verified
✅ NOT FEC mismatch: Both sides report FEC off/unsupported
✅ NOT MTU: Thor at 8966, switch at 9000 (within tolerance)
✅ NOT TCP buffers: Already at 128MB (net.core.rmem_max=134217728)
✅ NOT cable quality: All 8 SFP28 ports on switch show identical 25G links
Suspected Root Causes
1. mgbe_payload_cs_err Counter
The error counter is at 1739 (didn’t increment during testing, but non-zero). NVIDIA forums link this to:
- Orin thread: “Reduced bandwidth at 10Gbps on Orin and mgbe_payload_cs_err correlation”
- MACsec driver issues causing latent packet corruption
- Suggested fix: Rebuild kernel with
OSI_MACSEC_ENdisabled
Question: Is there a known fix for Thor/r38.2.1? Should we expect a driver update?
2. Multi-Lane (x4) Validation Status
Forums mention Thor x4 mode is “not fully validated yet” (only x1 officially supported). However:
- Official documentation clearly states 4x25GbE support
- All 4 ports DO negotiate and establish 25G links
- Throughput cap suggests partial functionality
Question: What’s the official status of 4x25GbE breakout mode on Thor dev kit? Should we expect ~10 Gbps as current ceiling?
3. Variable Throughput Pattern
Throughput cycles between 7-12.6 Gbps in a sawtooth pattern, suggesting TCP congestion backoff. But:
- No packet loss detected on switch or Thor
- Retransmits are minimal (48-58 per 30s test)
- CPU has plenty of headroom
Question: Is there a known queueing/driver contention issue in early Thor MGBE driver?
Attempted Fixes (No Significant Impact)
- ✅ Manual IRQ pinning:
echo N > /proc/irq/XXX/smp_affinity_list→ Improved from 8.4 to 10.4 Gbps (24% gain, not the 2-3x expected) - ✅ Stopped irqbalance service
- ✅ Enabled hardware offloads (TSO/GSO/GRO)
- ✅ Increased ring buffers to max:
ethtool -G mgbe0_0 rx 16384 - ✅ Increased TCP buffers to 128MB
- ❌ Interrupt coalescing: Interface must be down, but settings rejected with “Invalid argument”
- ❌ Force 10G mode:
ethtool -s mgbe0_0 speed 10000→ “link settings update failed”
Questions for NVIDIA
- Is 4x25GbE breakout mode fully supported on Thor dev kit (r38.2.1)? If not, what’s the expected timeline for validation?
- Is there a known fix for mgbe_payload_cs_err limiting throughput? Should we rebuild kernel without MACsec?
- Why does multi-streaming DEGRADE performance? (-P 8 gives 6.3 Gbps vs 10 Gbps single stream)
- What’s the expected line-rate performance for current Thor firmware? Is ~10 Gbps per port the current ceiling?
- Are there any Thor-specific MGBE driver patches coming in future JetPack releases?
Environment Details
- Thor: Jetson AGX Thor Dev Kit, JetPack r38.2.1, kernel 6.8.12-tegra
- Switch: MikroTik CRS518-16XS-2XQ, RouterOS 7.x
- Cables: Mellanox MCP1600-C001 1m DAC QSFP28 breakout (4x25G)
- Target: Mellanox ConnectX-5 Ex (100G port, tested stable)
- Test tool: iperf3 3.16 (multi-threaded)
Full diagnostics available: Can provide complete ethtool output, dmesg logs, or additional tests as needed.
Any guidance would be greatly appreciated - we’re trying to set up a mesh network for distributed AI inference and need full 25G performance. Currently evaluating whether to wait for driver fix or fall back to 10G aggregate.
Thanks!