Rcu: INFO: rcu_preempt self-detected stall on CPU ,Unable to access the system,System freeze

Still working on this.

Hi,

sorry in advance. Just want to clarify this again.

I think we may already discussed this before but still want to double check.

What is the exact usecase on your side? Running netplan in a loop seems to be just a method that you provided to us to reproduce this issue.

Hi,WayneWWW

Yes, that’s just a quicker way to reproduce the issue. We don’t need the CoE or video-related functionalities — we just want to use it as a regular 100G network interface for sending and receiving data.

However, we found that when bonding two 100G interfaces with a static IP configuration and connecting them via optical fiber, the system fails to boot properly or becomes stuck due to an RCU stall after a reboot or power-up. When the RCU stall occurs, unplugging the fiber cable allows the system to recover, which made us suspect the issue might be related to mgbe.

Since NetworkManager automatically manages network interfaces during startup — configuring the bond and assigning the static IP — that’s the only part we think might be related to mgbe during boot. We also noticed that after the system has fully booted, connecting the fiber cable sometimes causes RCU stalls again, similar to what happens during boot. As long as the link comes up and our bonded interface operates at 100G, the issue appears.

The operations performed by NetworkManager after the link comes up are essentially the same as running `netplan apply`, which is why we used a loop to repeatedly run `netplan apply` to reproduce the issue more quickly.

In the future, we’ll have multiple board bond to 100G interfaces and connected to a router and used as 100G NICs. Has there been any progress or feasible solution discussed for this issue so far?

How is the reproduce rate if we follow the normal usecase instead of using stress tool?

Hi,WayneWWW

I usually work more with the custom boards on my side. As long as two interconnected devices with four mgbe links are set up, the probability of the system freezing during boot is over 90%. Both devices enter RCU state simultaneously. If I connect the fiber after the system has finished booting, the probability will be lower. In that case, there might still be RCU logs printed, but the command line doesn’t necessarily freeze — or sometimes only one device freezes.

As for the devkit, we bond it to 100G and connect directly to the switch via QSFP. Normally, we don’t deliberately reboot, power-cycle, or frequently plug/unplug the QSFP optical module. The system runs continuously, mainly used by the algorithm team. Overall, the devkit seems to have a lower probability of issues compared to the custom board. i am not sure.
In my opinion, the custom board also outputs uphy directly, and the issue seems to originate from **nvethernet.ko**. After bonding, interrupts or other operations appear to experience timeouts, blocking one core and preventing the whole system from scheduling properly.

If I don’t use bonding, the RCU issue seems doesn’t occur. however ,we need it . Additionally, if I bond 4 links but only connect 3 fibers (i.e., fewer than the number of bonded members), the issue also seems less likely to appear. However, in that case, the maximum bandwidth is only 75 G. In version 7.1, when testing 100 G iperf tcp speed.it could only reach a bit over 30 G. If only 3 links are connected now, it’s hard to say for sure that the issue won’t occur, but the TCP iperf throughput would drop below 30Gbps

Hi wpceswpces,

If you are on Jetpack7.1 and got apt package nvidia-l4t-init-nvgpu_38.4.0 then following will NOT affect you.

If you have any earlier Jetpack version; /etc/systemd/nvpower.sh could be contributory to your intractable network problem.

quick confirmation is

grep nvpower /etc/udev/rules.d/*

If it returns nothing you are good and device is on Jetpack7.1.

If it returns:

/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="net", KERNEL=="mgbe0_0", RUN+="/bin/bash /etc/systemd/nvpower.sh --mgbe"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="net", KERNEL=="mgbe1_0", RUN+="/bin/bash /etc/systemd/nvpower.sh --mgbe"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="net", KERNEL=="mgbe2_0", RUN+="/bin/bash /etc/systemd/nvpower.sh --mgbe"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="net", KERNEL=="mgbe3_0", RUN+="/bin/bash /etc/systemd/nvpower.sh --mgbe"

Then you should
apt install nvidia-l4t-init

Because Jetpack 7 including up to nvidia-l4t-init-nvgpu_38.2.0-20250821174705_arm64.deb has the above udev rules that

if the carrier is 0 (no link), /etc/systemd/nvpower.sh forces the interface down:

function udev_mgbe_handler()
{
        # INTERFACE environment variable is initialized by udev based on
        # the interface detected by the kernel
        inf=$INTERFACE

        if [[ -e "/sys/class/net/$inf/carrier" ]]; then
                connected=$(cat "/sys/class/net/$inf/carrier")
                if [[ "$connected" -eq 0 ]]; then
                        ip link set dev "$inf" down
                fi
        fi
}

Hi,whitesscott

I am currently using Jetpack 7.1, but I have not installed nvidia-l4t-init-nvgpu. There are no network power management rules under my /etc/udev/rules.d/.

(base) root@tegra-ubuntu:~# grep nvpower /etc/udev/rules.d/*
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="pci", DEVPATH=="/devices/platform/bus@0/d0b0000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0", ACTION=="bind", RUN+="/bin/bash /etc/systemd/nvpower.sh --gpu"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="platform", DEVPATH=="/devices/platform/bus@0/17000000.gpu", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --gpu"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="platform", DEVPATH=="/devices/platform/bus@0/*/8188050000.vic", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --vic"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="platform", DEVPATH=="/devices/platform/bus@0/*/15340000.vic", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --vic"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="platform", DEVPATH=="/devices/platform/bus@0/*/15480000.nvdec", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --nvdec"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="platform", DEVPATH=="/devices/platform/bus@0/*/154c0000.nvenc", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --nvenc"
/etc/udev/rules.d/99-tegra-devices.rules:SUBSYSTEM=="hwmon", DEVPATH=="/devices/platform/bus@0/c600000.i2c/i2c-2/2-0044/hwmon/*", ACTION=="bind|change", RUN+="/bin/bash /etc/systemd/nvpower.sh --ina238"

1 Like

Hi wpceswpces,

My apologies for restating things you may already know. Your various logs show that CPU 4 is not reaching a quiescent state for an expedited RCU grace period. In the stall detector doc, expedited-stall messages mean a CPU failed to respond to the reschedule/IPI-based expedited mechanism, and the usual causes are a CPU looping in an RCU read-side section, or looping with interrupts/preemption/bottom halves disabled, or being trapped in a timer/interrupt/scheduler/low-level kernel path. The doc also says diagnosis usually comes from the stack dump of the stuck CPU, not from the RCU text alone.

tree_exp.h reinforces that reading. During an expedited grace period, RCU explicitly sends IPIs with smp_call_function_single(cpu, rcu_exp_handler, NULL, 0) to get each CPU to report a quiescent state. If that CPU is idle or preemptible enough, it reports quickly; otherwise the CPU is marked as still blocking the expedited grace period and RCU prints the exact sort of message you are seeing.

set_speed_work_func() is a real delayed-work path in source/nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c, and it does exactly the kind of work your stall stack shows: call OSI_CMD_SET_SPEED, then call phy_print_status(phydev), then adjust MAC clocks, then call netif_carrier_on().

The driver header kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.h, hard-codes ETHER_COMMON_IRQ_DEFAULT_CPU to 4, and the private struct stores common_isr_cpu_id / common_isr_cpu_mask. Your stall also happens on CPU 4.



Here is one combined patch to document the three changes I made to nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/ether_linux.{c,h}. These two files are attached. I will have time tomorrow to compile and test the resulting nvethernet.ko

  1. change ETHER_COMMON_IRQ_DEFAULT_CPU in ether_linux.h from 4U to 8U
  2. add a module parameter to gate phy_print_status() at runtime
  3. make set_speed_work_func() honor that parameter and fall back to netdev_dbg() when disabled.

After compilation and module load, runtime toggle usage can be:

cat /sys/module/nvethernet/parameters/phy_print_status_enable
echo 0 | sudo tee /sys/module/nvethernet/parameters/phy_print_status_enable
echo 1 | sudo tee /sys/module/nvethernet/parameters/phy_print_status_enable

diff --git a/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.h b/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.h
index 000000000000..111111111111 100644
--- a/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.h
+++ b/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.h
@@ -95,7 +95,11 @@
 /**
  * @brief CPU to handle ethernet common interrupt
  */
-#define ETHER_COMMON_IRQ_DEFAULT_CPU	4U
+/*
+ * Test/debug change:
+ * avoid concentrating the common Ethernet IRQ on CPU4, which matched
+ * the observed stalled CPU in the boot-time nvethernet set_speed path.
+ */
+#define ETHER_COMMON_IRQ_DEFAULT_CPU	8U

 /**
  * @addtogroup MAC address DT string
diff --git a/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c b/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
index 222222222222..333333333333 100644
--- a/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
+++ b/kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
@@ -1,5 +1,18 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+/*
+ * Test/debug knob:
+ * Control whether set_speed_work_func() emits phy_print_status() messages.
+ *
+ * Default is enabled to preserve current behavior. Set to 0 at runtime to
+ * suppress the status printk path while keeping the rest of link bring-up
+ * unchanged.
+ */
+static bool nvethernet_phy_print_status_enable = true;
+module_param_named(phy_print_status_enable,
+		   nvethernet_phy_print_status_enable, bool, 0644);
+MODULE_PARM_DESC(phy_print_status_enable,
+		 "Enable phy_print_status() from set_speed_work_func()");
+
 static void set_speed_work_func(struct work_struct *work)
 {
 	struct delayed_work *dwork = to_delayed_work(work);
@@ -1234,9 +1247,16 @@ static void set_speed_work_func(struct work_struct *work)
 	}

 	/* Set MGBE MAC_DIV/TX clk rate */
 	pdata->speed = speed;
-	phy_print_status(phydev);
+	if (nvethernet_phy_print_status_enable) {
+		phy_print_status(phydev);
+	} else {
+		netdev_dbg(dev,
+			   "set_speed_work: phy_print_status suppressed speed=%d duplex=%d interface=%d\n",
+			   pdata->speed, phydev->duplex, pdata->interface);
+	}
+
 	mac_clk = (pdata->osi_core->mac == OSI_MAC_HW_MGBE_T26X) ?
 		  pdata->mac_clk : pdata->mac_div_clk;
 	if (pdata->osi_core->mac_ver == OSI_EQOS_MAC_5_40) {
 		ether_set_eqos_tx_clk(pdata->mac_clk, pdata->speed);
 	} else {

ether_linux.c.txt (222.1 KB)

ether_linux.h.txt (28.1 KB)



Documents

cd Linux_for_Tegra/source/kernel/kernel-noble

Documentation/RCU/stallwarn.rst 
This document first discusses what sorts of issues RCU's CPU stall
detector can locate, and then discusses kernel parameters and Kconfig
options that can be used to fine-tune the detector's operation.  Finally,
this document explains the stall detector's "splat" format.

What Causes RCU CPU Stall Warnings?
Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
RCU-preempt Expedited Grace Periods
| One way to prevent your real-time application from getting hit with   |
| these IPIs is to build your kernel with ``CONFIG_NO_HZ_FULL=y``. RCU  |
| would then perceive the CPU running your application as being idle,   |
| and it would be able to safely detect that state without needing to   |
| IPI the CPU.      

Documentation/RCU/Design/Requirements/Requirements.rst
#. If a CPU is either idle or executing in usermode, and RCU believes it
   is non-idle, the scheduling-clock tick had better be running.
   Otherwise, you will get RCU CPU stall warnings. Or at best, very long
   (11-second) grace periods, with a pointless IPI waking the CPU from
   time to time.

Documentation/RCU/Design/Data-Structures/Data-Structures.rst
The ``rcu_node`` structures form the combining tree that propagates
quiescent-state information from the leaves to the root and also that
propagates grace-period information from the root down to the leaves.
They provides local copies of the grace-period state in order to allow
this information to be accessed in a synchronized manner without
suffering the scalability limitations that would otherwise be imposed by
global locking. In ``CONFIG_PREEMPT_RCU`` kernels, they manage the lists
of tasks that have blocked while in their current RCU read-side critical
section. In ``CONFIG_PREEMPT_RCU`` with ``CONFIG_RCU_BOOST``, they
manage the per-\ ``rcu_node`` priority-boosting kernel threads
(kthreads) and state. Finally, they record CPU-hotplug state in order to
determine which CPUs should be ignored during a given grace period.

Following included to document it:

/sys/module/rcupdate/parameters 
rcu_cpu_stall_cputime:0
rcu_cpu_stall_ftrace_dump:0
rcu_cpu_stall_suppress:0
rcu_cpu_stall_suppress_at_boot:0
rcu_cpu_stall_timeout:21
rcu_exp_cpu_stall_timeout:0
rcu_expedited:0
rcu_exp_stall_task_details:N
rcu_normal:0
rcu_normal_after_boot:0
rcu_task_collapse_lim:10
rcu_task_contend_lim:100
rcu_task_enqueue_lim:1
rcu_task_ipi_delay:0
rcu_task_lazy_lim:32
rcu_tasks_lazy_ms:-1
rcu_task_stall_info:2500
rcu_task_stall_info_mult:3
rcu_task_stall_timeout:150000
rcu_tasks_trace_lazy_ms:-1

Hi,whitesscott

It looks like a very thorough analysis. I’ll try to study it carefully and digest the details. Unfortunately, I don’t have two spare boards available at the moment—they’ve been lent out to others. I’ll try this patch once I have the boards back.

Hi,whitesscott

Updating with the latest situation: I just tried changing ETHER_COMMON_IRQ_DEFAULT_CPU to 8 as you suggested. Now the RCU is getting stuck on CPU core 8. Therefore, can we confirm that the point where RCU is blocked is caused by the interrupt in nvethernet?
Then I disabled phy_print_status_enable and found that RCU still gets stuck, which indicates that it is not caused by the phy_print_status(phydev); part.

[   37.268823] rcu: INFO: rcu_preempt self-detected stall on CPU
[   37.268826] rcu:     8-....: (5248 ticks this GP) idle=98e4/1/0x4000000000000000 softirq=984/984 fqs=1838
[   37.268829] rcu:     (t=5250 jiffies g=169 q=37992 ncpus=14)
[   64.492684] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 8-.... } 5298 jiffies s: 825 root: 0x100/.
[  100.280532] rcu: INFO: rcu_preempt self-detected stall on CPUebug):
[  100.280533] rcu:     8-....: (21001 ticks this GP) idle=98e4/1/0x4000000000000000 softirq=984/984 fqs=7186
[  100.280535] rcu:     (t=21003 jiffies g=169 q=43039 ncpus=14)
[  127.980395] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 8-.... } 21170 jiffies s: 825 root: 0x100/.
[  127.980405] rcu: blocking rcu_node structures (internal RCU debug):

(base) root@tegra-ubuntu:~# cat /sys/module/nvethernet/parameters/phy_print_status_enable
N

Here is the dmesg

[   35.468954] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   35.468960] rcu:     (detected by 2, t=5252 jiffies, g=153, q=90193 ncpus=14)
[   35.468965] rcu: All QSes seen, last rcu_preempt kthread activity 5238 (4294901161-4294895923), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   35.468969] rcu: rcu_preempt kthread starved for 5238 jiffies! g153 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=8
[   35.468974] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[   35.468975] rcu: RCU grace-period kthread stack dump:
[   35.468977] task:rcu_preempt     state:R  running task     stack:0     pid:16    tgid:16    ppid:2      flags:0x00000008
[   35.468985] Call trace:
[   35.468987]  __switch_to+0xe0/0x108
[   35.469000]  __schedule+0x368/0xbe4
[   35.469006]  preempt_schedule+0x48/0x60
[   35.469012]  _raw_spin_unlock_irqrestore+0x34/0x44
[   35.469018]  __timer_delete_sync+0x90/0xf8
[   35.469029]  schedule_timeout+0xac/0x1b4
[   35.469032]  rcu_gp_fqs_loop+0x128/0x6c0
[   35.469039]  rcu_gp_kthread+0x220/0x26c
[   35.469044]  kthread+0x110/0x114
[   35.469050]  ret_from_fork+0x10/0x20
[   35.469058] rcu: Stack dump where RCU GP kthread last ran:
[   35.469060] Sending NMI from CPU 2 to CPUs 8:
[   35.469094] NMI backtrace for cpu 8
[   35.510321] CPU: 8 PID: 1912 Comm: irq/287-mgbe3_0 Tainted: G        W  OE      6.8.12-tegra #1
[   35.510325] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 202512.0-39e87081 12/31/2025
[   35.510327] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   35.510329] pc : netif_schedule_queue+0x1c/0x54
[   35.510336] lr : netif_schedule_queue+0x18/0x54
[   35.510338] sp : ffff800080043e30
[   35.510339] x29: ffff800080043e30 x28: ffff0000f2000000 x27: ffffb62753f88a20
[   35.510342] x26: ffffb62753654b30 x25: ffffb62709f8c86c x24: ffffb62709f8c86c
[   35.510345] x23: 0000000000000002 x22: 0000000000000004 x21: ffff00008fa80000
[   35.510347] x20: 0000000000000000 x19: ffff000089838800 x18: 0000000000000000
[   35.510350] x17: ffff49f80079b000 x16: ffffb627525e5874 x15: 000001382a1bdda6
[   35.510352] x14: 0000000a16488be2 x13: 00003d28ed5197ec x12: 000000000000018c
[   35.510355] x11: 0000000000000040 x10: ffffb62753fa4b50 x9 : ffffb62753fa4b48
[   35.510357] x8 : ffff000082001940 x7 : 0000000000000000 x6 : 0000000000000001
[   35.510360] x5 : ffffb62753f869c0 x4 : 00000000fffee969 x3 : 0000000000000500
[   35.510363] x2 : 0000000000000004 x1 : ffff0000f2000000 x0 : 0000000000000001
[   35.510366] Call trace:
[   35.510368]  netif_schedule_queue+0x1c/0x54
[   35.510370]  netif_unfreeze_queues+0x48/0x8c
[   35.510374]  netif_tx_unlock+0x18/0x30
[   35.510376]  ether_restart_lane_bringup_task+0x9c/0x178 [nvethernet]
[   35.510387]  tasklet_action_common.isra.0+0xec/0x338
[   35.510392]  tasklet_hi_action+0x28/0x34
[   35.510394]  handle_softirqs+0x120/0x36c
[   35.510397]  __do_softirq+0x14/0x20
[   35.510399]  ____do_softirq+0x10/0x1c
[   35.510402]  call_on_irq_stack+0x24/0x4c
[   35.510404]  do_softirq_own_stack+0x1c/0x28
[   35.510406]  irq_exit_rcu+0xbc/0xcc
[   35.510408]  el1_interrupt+0x38/0x68
[   35.510411]  el1h_64_irq_handler+0x18/0x24
[   35.510413]  el1h_64_irq+0x68/0x6c
[   35.510414]  finish_task_switch.isra.0+0x74/0x24c
[   35.510418]  __schedule+0x36c/0xbe4
[   35.510421]  preempt_schedule+0x48/0x60
[   35.510423]  affine_move_task+0x160/0x4b0
[   35.510427]  __set_cpus_allowed_ptr_locked+0x170/0x1dc
[   35.510429]  __set_cpus_allowed_ptr+0x68/0xc0
[   35.510432]  set_cpus_allowed_ptr+0x34/0x60
[   35.510434]  irq_thread_check_affinity+0x78/0xd0
[   35.510439]  irq_thread+0xb8/0x24c
[   35.510441]  kthread+0x110/0x114
[   35.510443]  ret_from_fork+0x10/0x20
[   39.916937] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {

Based on the stack information from CPU8, it is suspected to be related to ether_restart_lane_bringup_task.

Hi wpceswpces,

Recap: The RCU stall looked like a generic rcu_preempt cpu stall problem, but your attached logs narrow it down to the Ethernet path. A clue was that the stuck task was in nvethernet link-speed work, and the stack passed through phy_print_status() and the printk path, so we tested whether logging was the trigger. We made phy_print_status() runtime-controllable with a module parameter and disabled it, but the stall still happened, so that ruled out phy_print_status() as the primary cause.

Next, we changed ETHER_COMMON_IRQ_DEFAULT_CPU from CPU 4 to CPU 8. The stall moved from CPU 4 to CPU 8, with the problem follows the CPU handling the relevant nvethernet interrupt/work path. This suggests the RCU warning is a symptom of CPU starvation caused by the Ethernet interrupt/softirq/tasklet path, not an RCU bug by itself.


The stall trace you provided showed the stuck context was not mainly set_speed_work_func() anymore, but irq/287-mgbe3_0 with the key function on stack being ether_restart_lane_bringup_task(). I reviewed _restart_lane_bringup_task() as implemented in nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/osd.c. That mainly stops TX/carrier and schedules set_speed_work on the disable path, or starts TX queues on the enable path. That made it look less like the tasklet body itself was looping, and more like it was being triggered repeatedly or at a bad time.

Following the callback chain showed that the real trigger comes from mgbe_core.c. In mgbe_handle_link_change_and_fpe_intrs(), the driver reacts to MAC link-status interrupts. On LOCAL_FAULT, it disables the link-status interrupt, marks the lane down, and calls restart_lane_bringup(…, OSI_DISABLE). On LINK_OK, it calls restart_lane_bringup(…, OSI_ENABLE) and re-enables the link-status interrupt. That made a leading theory tighter: repeated LOCAL_FAULT / LINK_OK transitions during boot are driving repeated lane-restart handling, which can starve the CPU handling that path and then produce the RCU stall.



Further troubleshooting that may be warranted.

  1. In attached nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/nvethernetrm/osi/core/mgbe_core.c,

    mgbe_core.c.txt (165.4 KB)

    I added instrumentation to mgbe_handle_link_change_and_fpe_intrs() so it logs only the key transition points, ratelimited: LOCAL_FAULT → restart_lane_bringup(DISABLE), LINK_OK → restart_lane_bringup(ENABLE), and LINK_OK ignored. The idea is that the next reproducer should show whether the system is oscillating between fault and recovery, or getting stuck mostly in local faults.

  2. The RCU stall now looks less like a generic RCU problem and more like a symptom of nvethernet MAC link-status interrupt handling, especially the lane-restart path triggered by LOCAL_FAULT / LINK_OK, likely amplified by force-restart-lane-bringup = <1> on all four 25G ports.

Checked my Thor device tree and found that all four mgbe nodes have nvidia,force-restart-lane-bringup = <1>;. The osd.c code explicitly uses that flag on T26x MGBE to allow restart-lane logic to run on link changes. That makes this DT property one of the strongest configuration suspects so far. We should also test disabling that property on all four ports, because if it reduces or eliminates the fault/recovery storm that could help diagnose.

In
hardware/nvidia/t264/nv-public/nv-platform/tegra264-p4071-0000.dtsi
or
dtc -I dtb -O dts -o bpmp.dts /generic/tegra264-bpmp-3834-0008-4071-xxxx.dtb # your board dtb.
change the 4 instances of:

			nvidia,force-restart-lane-bringup = <1>;
to
			nvidia,force-restart-lane-bringup = <0>;
  1. Check the UPHY differential pair signal integrity and the 0.8V/1.8V power rails for the MGBE block. If voltage drops during bringup, the lane-bringup logic hangs.