Network crash after wake on land with direct cable

jordimarine · July 21, 2021, 8:39am

Hi All,

I encountered the same issue as in this post[1], but we need to use a direct wire instead of working with a switch.
Any help will be appreciated.

We can reproduce the problem always

Setup:
Direct or cross Ethernet cable from Xavier NX to a PC
Fixed IP in both sides.

Commands Nvidia:
sudo ethtool -s eth0 wol g
sudo systemctl suspend

Command Pc:
sudo etherwake -i eno1 [MAC]

Nvida wake up but network lights stop, we can read this message in dmesg, also after 2 minuts the PC reboot.

[ 158.643764] eqos 2490000.ether_qos: WoL Failed to reset MAC
[ 158.644047] dpm_run_callback(): eqos_resume_noirq+0x0/0x1d0 returns -19
[ 158.644282] PM: Device 2490000.ether_qos failed to resume noirq: error -19

[1] Issue with Wake-on-LAN on Xavier NX - #9 by WayneWWW

WayneWWW · July 21, 2021, 9:23am

Will the WoL work when you connect both devices to the switch?

jordimarine · July 21, 2021, 9:29am

Yes it works.

WayneWWW · July 21, 2021, 10:05am

Hi, I just tried with my local setup, which is same as what you said.

But our NX is still able to be woke up by the host side. Is your NX 100% not able to wake on by the host? I mean cold boot the device and try it 10 times, will it fail to wake up 10 times?

jordimarine · July 21, 2021, 10:13am

It wakes up each time, but after the wake up:

Networks stop working intermediate (lan lights go off, no answer from ssh)
I can read the mentioned message in dmesg (using hdmi and keyboard)
System reboot in some minutes

WayneWWW · July 21, 2021, 10:20am

Can you share the panic log? If it reboot, probably has kernel panic.

jordimarine · July 21, 2021, 10:55am

Thank you for your support.
I send the kernel and sys log of a crash. Reboot was on 12:35.
kernelLog.txt (379.3 KB)

syslog.txt (170.4 KB)

WayneWWW · July 21, 2021, 11:17am

Hi,

syslog will not record the kernel panic. Please use the uart log to monitor.

jordimarine · July 22, 2021, 9:03am

We don’t have the interface, I will come back with the log when I get this.
Thank you.

jordimarine · July 22, 2021, 11:17am

This is the kernel panic information.
nvidiaWOLConsole.txt (51.9 KB)
[ 62.538111] cache: parent cpu1 should not be sleeping
[ 62.550725] cache: parent cpu2 should not be sleeping
[ 62.582729] cache: parent cpu3 should not be sleeping
[ 62.612976] cache: parent cpu4 should not be sleeping
[ 62.648782] cache: parent cpu5 should not be sleeping
[ 68.354126] eqos 2490000.ether_qos: WoL Failed to reset MAC
[ 68.354391] dpm_run_callback(): eqos_resume_noirq+0x0/0x1d8 returns -19
[ 68.354633] PM: Device 2490000.ether_qos failed to resume noirq: error -19
[ 68.397817] tegra_cec 3960000.tegra_cec: Resuming
[ 68.398052] tegra_cec 3960000.tegra_cec: tegra_cec_init started
[ 69.408235] tegra_cec 3960000.tegra_cec: tegra_cec_init Done.
[ 243.040291] INFO: task whoopsie:5649 blocked for more than 120 seconds.
[ 243.040473] Not tainted 4.9.201-tegra #1
[ 243.040556] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this.
[ 243.041003] Kernel panic - not syncing: hung_task: blocked tasks
[ 243.041125] CPU: 0 PID: 683 Comm: khungtaskd Not tainted 4.9.201-tegra #1
[ 243.041246] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 243.041356] Call trace:
[ 243.041417] [] dump_backtrace+0x0/0x198
[ 243.041544] [] show_stack+0x24/0x30
[ 243.041649] [] dump_stack+0xa0/0xc8
[ 243.041779] [] panic+0x12c/0x2a8
[ 243.041880] [] watchdog+0x320/0x3d0
[ 243.041976] [] kthread+0xec/0xf0
[ 243.042066] [] ret_from_fork+0x10/0x30
[ 243.042176] SMP: stopping secondary CPUs
[ 243.042462] Kernel Offset: disabled
[ 243.042719] Memory Limit: none
[ 243.042947] trusty-log panic notifier - trusty version Built: 08:40:57 Feb 1.

Cruise_Tang · October 20, 2021, 5:52am

Hi jordimarine, I meet same error like yours, is there any update form this topic?

WayneWWW · October 20, 2021, 6:52am

We are working on fix.

marinelkh11 · October 20, 2021, 8:26am

We are not using WoL because of this issue.
A fix would be great.

Cruise_Tang · October 21, 2021, 2:27am

Hi WayneWWW, can your site reproduce it yet, or had you got more information about this issue.

static INT eqos_car_reset(struct eqos_prv_data *pdata)
{
	/* one sec timeout */
	ULONG retry_cnt = (500 * 1000);
	ULONG vy_count = 0;
	ULONG dma_bmr;
	/* deassert rst line */
	if (!IS_ERR_OR_NULL(pdata->eqos_rst)){
		dev_err(&pdata->pdev->dev,"deassert rst line ! null");
		reset_control_reset(pdata->eqos_rst);
	}
	else
	{
		dev_err(&pdata->pdev->dev,"deassert rst line was null");
	}
	/* add delay of 10 usec */
	udelay(10);

	while (vy_count < retry_cnt) {
		DMA_BMR_RD(dma_bmr);    // get address via dma
		if (GET_VALUE(dma_bmr,  //  get value via dma
			DMA_BMR_SWR_LPOS, DMA_BMR_SWR_HPOS) == 0) {
			return Y_SUCCESS;
		}
		vy_count++;
		udelay(10);
	}
	return -Y_FAILURE;
}

static int eqos_resume_noirq(struct device *dev){
...
	if (device_may_wakeup(&ndev->dev)) {
		disable_irq_wake(pdata->phydev->irq);
		/* issue CAR reset to device */

		ret = hw_if->car_reset(pdata);
		if (ret < 0) {
			dev_err(&pdata->pdev->dev, "WoL Failed to reset MAC, try again\n");
			// return -ENODEV;  
			
			ret = hw_if->car_reset(pdata);
			if (ret < 0) {
				dev_err(&pdata->pdev->dev, "WoL Failed to reset MAC\n");
				return -ENODEV; // why not return timeout here？
			}
		}
		eqos_start_dev(pdata);
...

I thought it might be function ‘eqos_car_reset’ return the error, it seems phy reset timeout. but I don’t know why it timeout because I have no document about it, also I don’t know it’s software or hardware cause the issue.

JJ.C · June 21, 2022, 2:08am

Hi,
We can reproduce this isse on L4T32.6.1, may I know which release can fix this ?

WayneWWW · July 29, 2022, 11:31am

Please apply this patch to kernel.

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 963cd9b..8a61c57 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -122,13 +122,29 @@
 
 static int rtl8211f_ack_interrupt(struct phy_device *phydev)
 {
-	int err;
+	int err, ret;
 
 	phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43);
 	err = phy_read(phydev, RTL8211F_INSR);
 	/* restore to default page 0 */
 	phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
 
+	/* ack the WOL interrupt and toggle the WOL specific registers
+	 * to enable PME pin for WOL trigger events for next time
+	 * until disabled from ethtool ioctl
+	 */
+	if (err & RTL8211F_WOL_ENABLE_PMEB_EVENT) {
+		ret = rtl8211f_wol_settings(phydev, false);
+		if (ret < 0)
+			return ret;
+
+		ret = rtl8211f_wol_settings(phydev, true);
+		if (ret < 0)
+			return ret;
+
+		return 0;
+	}
+
 	return (err < 0) ? err : 0;
 }

system · August 24, 2022, 1:16am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Network crash after wake on land with direct cable

Setup: Direct or cross Ethernet cable from Xavier NX to a PC Fixed IP in both sides.

Commands Nvidia: sudo ethtool -s eth0 wol g sudo systemctl suspend

Command Pc: sudo etherwake -i eno1 [MAC]

Setup:
Direct or cross Ethernet cable from Xavier NX to a PC
Fixed IP in both sides.

Commands Nvidia:
sudo ethtool -s eth0 wol g
sudo systemctl suspend

Command Pc:
sudo etherwake -i eno1 [MAC]