AGX Orin network interface reboot unexpectedly

Hello,

We are facing an issue that orin network card eth0 reboot unexpectedly, the reboot usually take several seconds to recover, it is not frequentlly happen but it is critical, because all network flow stops during the reboot.

Please advice me what is happening, any help will be appreciated.

The following is print from syslog, please refer to attached full log for detail.

May 9 10:41:07 O2-004-foxorin1 systemd[1]: Starting Cleanup of Temporary Directories…
May 9 10:41:07 O2-004-foxorin1 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
May 9 10:41:07 O2-004-foxorin1 systemd[1]: Finished Cleanup of Temporary Directories.
May 9 10:41:08 O2-004-foxorin1 nvfancontrol[563]: NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.
May 9 10:43:44 O2-004-foxorin1 nvfancontrol[563]: message repeated 116 times: [ NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.]
May 9 10:44:38 O2-004-foxorin1 xone-agent-run[2124]: time=“2023-05-09T10:44:38+08:00” level=info msg=“[monitoring 2023-05-09 10:44:16 — 2023-05-09 10:44:38]”
May 9 10:45:02 O2-004-foxorin1 nvfancontrol[563]: NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.
May 9 10:46:10 O2-004-foxorin1 kernel: [ 1250.572673] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
May 9 10:46:10 O2-004-foxorin1 kernel: [ 1250.583418] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19
May 9 10:46:26 O2-004-foxorin1 kernel: [ 1265.932404] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
May 9 10:46:26 O2-004-foxorin1 kernel: [ 1265.943366] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19
May 9 10:46:35 O2-004-foxorin1 kernel: [ 1275.148242] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
May 9 10:46:35 O2-004-foxorin1 kernel: [ 1275.159121] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19
May 9 10:47:27 O2-004-foxorin1 kernel: [ 1327.371875] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
May 9 10:47:30 O2-004-foxorin1 kernel: [ 1330.439906] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19
May 9 10:47:34 O2-004-foxorin1 kernel: [ 1334.445941] nvethernet 6810000.ethernet eth0: Link is Down
May 9 10:47:39 O2-004-foxorin1 kernel: [ 1339.564055] nvethernet 6810000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx/tx
May 9 10:48:56 O2-004-foxorin1 nvfancontrol[563]: message repeated 155 times: [ NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.]
May 9 10:49:38 O2-004-foxorin1 xone-agent-run[2124]: time=“2023-05-09T10:49:38+08:00” level=info msg=“[monitoring 2023-05-09 10:49:16 — 2023-05-09 10:49:38]”
May 9 10:50:14 O2-004-foxorin1 nvfancontrol[563]: NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.
May 9 10:54:08 O2-004-foxorin1 nvfancontrol[563]: message repeated 155 times: [ NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.]
May 9 10:54:29 O2-004-foxorin1 xone-agent-run[2124]: time=“2023-05-09T10:54:29+08:00” level=info msg=“[monitoring 2023-05-09 10:54:16 — 2023-05-09 10:54:29]”
May 9 10:55:26 O2-004-foxorin1 nvfancontrol[563]: NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to 255, please check if the fan is faulty.
May 9 10:56:30 O2-004-foxorin1 kernel: [ 1869.982295] nvethernet 6810000.ethernet eth0: Link is Down
May 9 10:56:31 O2-004-foxorin1 kernel: [ 1871.101448] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
May 9 10:56:31 O2-004-foxorin1 kernel: [ 1871.112173] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19
May 9 10:56:34 O2-004-foxorin1 kernel: [ 1874.097298] nvethernet 6810000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx/tx

syslog.6 (705.5 KB)

Is there any application running to trigger the reboot?

is this on NV devkit or custom board?

as I know the reboot is not initiated by us, and there is no application would trigger that.

another clue is the reboot is only on eth0, during the reboot, network traffic over loopback interface is still working.

it is ‘NV devkit’

Please try to find out how to reproduce this issue in a stable way so that we can look into this. Thanks.