Jetson AGX Xavier suddenly reboot

I used Jetson AGX Xavier(JetPack 4.3.0) with Intel Dual Band Wireless-AC 8265.
And, I can use Wi-Fi network. But, sometimes Jetson AGX Xavier suddenly reboot.

I investigated reproducibility condition about this problem.
I think that file transfer(using scp) via Wi-Fi is involved.

The /var/log/syslog is as follows.

Jun  8 18:52:47 jetson-test dhclient[11785]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3 (xid=0x6291e63d)
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: Joining mDNS multicast group on interface eth2.IPv6 with address fe80::4d6b:6716:9731:e1c.
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: New relevant interface eth2.IPv6 for mDNS.
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: Registering new address record for fe80::4d6b:6716:9731:e1c on eth2.*.
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: Joining mDNS multicast group on interface eth1.IPv6 with address fe80::aebc:2a29:348e:ee4e.
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: New relevant interface eth1.IPv6 for mDNS.
Jun  8 18:52:47 jetson-test avahi-daemon[4763]: Registering new address record for fe80::aebc:2a29:348e:ee4e on eth1.*.
Jun  8 18:52:48 jetson-test dhclient[11787]: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 3 (xid=0x647cac78)
Jun  8 18:52:50 jetson-test dhclient[11785]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 (xid=0x6291e63d)
Jun  8 18:52:51 jetson-test dhclient[11787]: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 5 (xid=0x647cac78)
Jun  8 18:52:56 jetson-test dhclient[11785]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13 (xid=0x6291e63d)
Jun  8 18:52:56 jetson-test dhclient[11787]: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 9 (xid=0x647cac78)
Jun  8 18:53:05 jetson-test dhclient[11787]: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 17 (xid=0x647cac78)
Jun  8 18:53:09 jetson-test dhclient[11785]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 10 (xid=0x6291e63d)
Jun  8 18:53:19 jetson-test dhclient[11785]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 11 (xid=0x6291e63d)
Jun  8 18:53:22 jetson-test dhclient[11787]: DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 8 (xid=0x647cac78)
Jun  8 18:53:30 jetson-test dhclient[11785]: DHJun  8 18:58:14 jetson-test systemd-modules-load[2431]: Inserted module 'bluedroid_pm'
Jun  8 18:58:14 jetson-test systemd-modules-load[2431]: Module 'nvhost_vi' is builtin
Jun  8 18:58:14 jetson-test systemd-modules-load[2431]: Inserted module 'nvgpu'
Jun  8 18:58:14 jetson-test systemd-sysctl[2820]: Couldn't write '1' to 'net/ipv4/tcp_syncookies', ignoring: No such file or directory
Jun  8 18:58:14 jetson-test systemd-sysctl[2820]: Couldn't write '1' to 'kernel/yama/ptrace_scope', ignoring: No such file or directory
Jun  8 18:58:14 jetson-test systemd-sysctl[2820]: Couldn't write 'fq_codel' to 'net/core/default_qdisc', ignoring: No such file or directory
Jun  8 18:58:14 jetson-test systemd-udevd[3217]: Network interface NamePolicy= disabled on kernel command line, ignoring.
  • Jun 8 18:53:30 jetson-test dhclient[11785]: DH : crashed
  • Jun 8 18:58:14 jetson-test systemd-modules-load[2431]: Inserted module 'bluedroid_pm' : reboot and initialization

But, I don’t know what triggers this reboot.

Hi,

Is wifi module the only peripheral on your carrier board? Is there a camera running?

Also, could you dump the serial console log and see if any kernel panic error when you see the reboot error?

Thank you for your reply.

Is wifi module the only peripheral on your carrier board? Is there a camera running?

No. I connect wifi module only.

Also, could you dump the serial console log and see if any kernel panic error when you see the reboot error?

No. Because, I don’t have USB-to-TTL Serial Cable.


And, I used Ethernet to transfer data without Wi-Fi connection. In this case, this problem was not reproduced.
So, I guess that this is a problem of iwlwifi driver.

I tried to change the following iwlwifi parameters.

options iwlwifi 11n_disable=8 power_level=5

As a result, this problem was not reproduced.
So, I will try and see what happens a little more.

Files which are missing in “/proc/sys” are kernel features which were not enabled (the files are not real files, they are an interface to a driver…no driver running, no file). As an example, check this:
zcat /proc/config.gz | egrep 'SYN_COOKIES'
…unless CONFIG_SYN_COOKIES is either “=m” or “=y” the file for this will be missing.

Incidentally, these files are normally configured from “/etc/sysctl.conf”. Tokens in this file are just converted to a “/” separated directory after prepending “/proc/sys”.

If you compile that feature into the kernel (either as a module or integrated), then the file will appear. However, these features have nothing to do with reboot, so you can ignore these unless you need the feature, e.g., the ptrace_scope may be something you work with, but otherwise you can ignore it. However, I suppose there is a possibility that if the dhclient expects a feature, and the feature is not present, this could cause a dhclient crash. Unless those log messages occur right at the moment of crash I doubt these are the cause. If you run “dmesg --follow” via a serial console, and then check the last thing in logs, perhaps those messages can be ruled out. Maybe lack of an expected kernel feature does cause a serious crash/reboot.

No. Because, I don’t have USB-to-TTL Serial Cable.

If this is Xavier devkit then you don’t need such cable because the micro usb port on Xavier devkit is a UART port. Connecting a micro USB and open /dev/ttyUSB3 on host should dump the log.

Please dump the log first or we cannot tell what is the cause.

If this is Xavier devkit then you don’t need such cable because the micro usb port on Xavier devkit is a UART port. Connecting a micro USB and open /dev/ttyUSB3 on host should dump the log.

I 'll use micro USB - USB Type-A cable.

So, could you tell me the way to use serial cosole?
I could not find the information from the following document.
https://docs.nvidia.com/jetson/l4t/index.html

Please use serial console tool to open the /dev/ttyUSB3 on your host.

I checked serial console log.

[  292.671432] watchdog: watchdog0: watchdog did not stop!
[  292.685817] systemd-shutdow: 44 output lines suppressed due to ratelimiting
[  293.636553] tegra-xusb 3610000.xhci: Host not halted after 16000 microseconds.
[  293.636809] tegra-xusb 3610000.xhci: Host controller not halted, aborting reset.
[  293.722935] reboot: Restarting system
Shutdown state requested 1
Rebooting system ...

[0000.053] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.062] I> MB1 (prd-version: 1.5.1.2-t194-41334769-9ec1833d)
[0000.067] I> Boot-mode: Coldboot
[0000.070] I> Chip revision : A02P
[0000.073] I> Bootrom patch version : 15 (correctly patched)
[0000.078] I> ATE fuse revision : 0x200
[0000.082] I> Ram repair fuse : 0x0
[0000.085] I> Ram Code : 0x2
[0000.088] I> rst_source : 0xb
[0000.090] I> rst_level : 0x1
[0000.094] I> Boot-device: eMMC
[0000.109] I> sdmmc DDR50 mode
[0000.113] W> No valid slot number is found in scratch register
[0000.118] W> Return default slot: _a
[0000.122] I> Active Boot chain : 0
[0000.125] I> Boot-device: eMMC
[0000.128] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.135] I> Temperature = 34000
[0000.138] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.142] W> Skipping boost for clk: BPMP_APB
[0000.146] W> Skipping boost for clk: AXI_CBB
[0000.150] W> Skipping boost for clk: AON_CPU_NIC
[0000.154] W> Skipping boost for clk: CAN1
[0000.158] W> Skipping boost for clk: CAN2
[0000.162] I> Boot-device: eMMC
[0000.165] I> Boot-device: eMMC
[0000.175] I> Sdmmc: HS400 mode enabled
[0000.179] I> ECC region[0]: Start:0x0, End:0x0
[0000.183] I> ECC region[1]: Start:0x0, End:0x0
[0000.187] I> ECC region[2]: Start:0x0, End:0x0
[0000.191] I> ECC region[3]: Start:0x0, End:0x0
[0000.196] I> ECC region[4]: Start:0x0, End:0x0
[0000.200] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.205] I> Non-ECC region[1]: Start:0x0, End:0x0
[0000.210] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.214] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.219] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.224] E> FAILED: Thermal config
[0000.232] E> FAILED: MEMIO rail config
[0000.250] I> Boot-device: eMMC
[0000.260] I> sdmmc bdev is already initialized
[0000.327] I> MB1 done

Hi dandelion1124,

[ 292.671432] watchdog: watchdog0: watchdog did not stop!
[ 292.685817] systemd-shutdow: 44 output lines suppressed due to ratelimiting
[ 293.636553] tegra-xusb 3610000.xhci: Host not halted after 16000 microseconds.
[ 293.636809] tegra-xusb 3610000.xhci: Host controller not halted, aborting reset.
[ 293.722935] reboot: Restarting system
Shutdown state requested 1
Rebooting system …

Unfortunately, this log is too late and the reboot already started. We need to find the log prior to this part. Do you have it?
Actually, It looks like it is running a normal reboot process. Also, the timestamp is in 293 second, did you see the reboot right after you boot on the device for about 5 mins?

Hi dandelion1124,

Is this still an issue to support? Any status can be updated?

Now, I could not reprodude. When this problem reproduce, I’ ll attach serial console log.