Thor occasionally stuck during boot

I usually access the Jetson Thor via ssh and I noticed that occasinally (about once every 3 or 4 startups) the fan keeps spinning at 100% instead of slowing down during boot sequence.

When I attach a screen to se the situation, I get this:

Then I press the reset button and usually it boots up normally.

Do you have any suggestion to avoid it to happen?

It still happens quite frequently, is this common?
Do you need some specific info to find the cause?

No, this is not common.

Are you using NV devkit or a custom board? or you are not sure about what my question means here?

I’m using Jetson Thor Developer Kit.

I’ve been trying to debug by activating journald but it gets stuck before it can register the event, send like a firmare panic more than a kernel panic. I tried activating the reboot on panic but I’m not confident it will help.

Any suggestion on how to debug?
I’m on Jetpack 7.1, could a complete reinstall through SDK Manager help?

Here’s another transcript of a startup error.

[  26.332512] rtk_btusb: Pri:15, Patch length 0xd9b9
[  26.332512] rtk_btusb: opcode 0x0008
[  26.332513] rtk_btusb: Unknown Opcode. Ignore
[  26.332514] rtk_btusb: buf_len = 0xd9b9
[  26.332517] rtk_btusb: len = 0xd9b9
[  27.711908] rtk_btusb: fw: exists, config file: exists
[  27.712184] rtk_btusb: load_firmware done
[  27.712201] rtk_btusb: download_data start
[  28.178159] rtk_btusb: download_data done
[  28.178515] rtk_btusb: HCI reset..
[  28.194017] rtk_btusb: read_ver_rsp->lmp_subver = 0xcb71
[  28.194667] rtk_btusb: read_ver_rsp->hci_rev = 0x40b
[  28.194671] rtk_btusb: patch_entry->lmp_sub = 0x8852
[  28.195382] rtk_btusb: Rtk patch end
[  28.199973] rtk_btusb: chip type value: 0x78
[  28.203450] rtk_btusb: btusb_open set HCI UP RUNNING
[  28.203827] rtk_btcoex: Open BTCOEX
[  28.211797] rtk_btusb: btusb_open end
[  28.216995] rtk_btusb: ISO handle range (handle >= 0x010)
[  28.224885] rtk_btcoex: rtk_vendor_cmd_to_fw: opcode 0xfc1b
[  28.239847] rtk_btcoex: BTCOEX hci_rev 0x40b
[  28.240180] rtk_btcoex: BTCOEX lmp_subver 0xcb71
[  28.271842] rtk_btusb: btusb_notify: hci evt 3
[  30.322362] rtk_btusb: btusb_flush add delay
[  30.332657] rtk_btusb: btusb_close
[  30.344070] rtk_btcoex: Close BTCOEX
[  30.344348] rtk_btcoex: x

[  32.438249] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[  32.453058] 818930000.rtcpu:hsp-vm1: response 0x43000000: response timeout
[  32.453317] 818930000.rtcpu:hsp-vm1: PM_SUSPEND failed: 0xffffff92
[  32.455351] tegra186-cam-rtcpu 818930000.rtcpu: RTCPU suspend failed, resetting it
[  32.473036] [RCE] VM0 deactivated
[  32.473924] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[  34.899352] [RCE] ERROR: camera-ip/isp5/isp.c:7210 [isp5_pm_handler] "ERROR: Failed to turn isp power off"
[  35.948022] usb 1-4.2: new full-speed USB device number 4 using tegra-xusb
[  36.045415] input: MOSART Semi. Trust Deskset as /devices/platform/bus@0/a800a10000.usb/usb1/1-4/1-4.2/1-4.2:1.0/0003:145F:02FF.0001/input/input1
[  36.045415] hid-generic 0003:145F:02FF.0001: input,hidraw0: USB HID v1.10 Keyboard [MOSART Semi. Trust Deskset] on usb-a800a10000.usb-4.2/input0
[  36.051746] probe of 0003:145F:02FF.0001 returned 0 after 10559 usecs
[  36.051306] probe of 1-4.2:1.0 returned 0 after 114276 usecs
[  36.060975] input: MOSART Semi. Trust Deskset Mouse as /devices/platform/bus@0/a800a10000.usb/usb1/1-4/1-4.2/1-4.2:1.1/0003:145F:02FF.0002/input/input2
[  36.070970] input: MOSART Semi. Trust Deskset Consumer Control as /devices/platform/bus@0/a800a10000.usb/usb1/1-4/1-4.2/1-4.2:1.1/0003:145F:02FF.0002/input/input3
[  36.145495] input: MOSART Semi. Trust Deskset System Control as /devices/platform/bus@0/a800a10000.usb/usb1/1-4/1-4.2/1-4.2:1.1/0003:145F:02FF.0002/input/input4
[  36.146867] input: MOSART Semi. Trust Deskset as /devices/platform/bus@0/a800a10000.usb/usb1/1-4/1-4.2/1-4.2:1.1/0003:145F:02FF.0002/input/input5
[  36.159657] hid-generic 0003:145F:02FF.0002: input,hidraw1: USB HID v1.10 Mouse [MOSART Semi. Trust Deskset] on usb-a800a10000.usb-4.2/input1
[  36.172013] probe of 0003:145F:02FF.0002 returned 0 after 121389 usecs
[  36.172209] probe of 1-4.2:1.1 returned 0 after 258261 usecs

[  36.325065] tegra_mc 810802000.memory-controller: sync_state() pending due to 818120000.host1x
[  36.325475] tegra186-emc 810802000.memory-controller:external-memory-controller@810800000: sync_state() pending due to 818120000.host1x
[  36.329285] tegra_mc 810802000.memory-controller: sync_state() pending due to bus@0:aaconnect@9000000
[  36.341854] tegra186-emc 810802000.memory-controller:external-memory-controller@810800000: sync_state() pending due to 8808c00000.display
[  36.355248] tegra_mc 810802000.memory-controller: sync_state() pending due to 8808c00000.display
[  36.367352] tegra186-emc 810802000.memory-controller:external-memory-controller@810800000: sync_state() pending due to 0000:01:00.0
[  36.388751] tegra_mc 810802000.memory-controller: sync_state() pending due to 0000:01:00.0

[  38.488231] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[  38.453412] [RCE] ERROR: camera-ip/nvcsi/nvcsi.c:4448 [nvcsi_pm_handler] "ERROR: Failed to turn nvcsi power off"
[  38.474243] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[  40.495050] [RCE] ERROR: camera-ip/vi5/vi5.c:8197 [vi5_pm_handler] "ERROR: Failed to turn vi power off"

Could you reflash the board with sdkmanager again and see if you could bypass error?

If not, please remove peripherals and left only the power cable + UART cable and see if issue is still. This is just to figure out which peripherals might cause this problem.

I only use the usb power cable, an HDMI dongle and the ethernet cable to connect from the laptop.

I’ll try reflashing as soon as I can!

1 Like

I reflashed with BSP method, Jetpack 7.1. It still freezes on boot from time to time. Here’s the last few lines from last time it happened (I haven’t found a way to get the logs as it seems to happen before journald is active):

[    7.084366] host1x-context 8181200000.host1x.host1x-ctx.5: Adding to iommu group 22
[    7.106298] host1x-context 8181200000.host1x.host1x-ctx.6: Adding to iommu group 23
[    7.129320] host1x-context 8181200000.host1x.host1x-ctx.7: Adding to iommu group 24
[    7.233135] EXT4-fs (nvme0n1): re-mounted da26ec59-aa06-4ba0-aeba-89d81a4b9 r/w. Quota mode: none.
[    7.327015] loop0: detected capacity change from 0 to 8
[    7.328361] loop1: detected capacity change from 0 to 141200
[    7.326513] loop2: detected capacity change from 0 to 85200
[    7.365512] loop3: detected capacity change from 0 to 450520
[    7.367422] loop4: detected capacity change from 0 to 482776
[    7.382389] loop5: detected capacity change from 0 to 187776
[    7.398873] loop6: detected capacity change from 0 to 1039064
[    7.456073] tegra-dce 80080000.dce: Adding to iommu group 25
[    7.458195] pstore: Using crash dump compression: deflate
[    7.459068] tegra-dce 80080000.dce: Setting DCE HSP functions for tegra234-dce
[    7.459138] printk: legacy console [ramoops-1] enabled
[    7.494685] dce: dce_ipc_channel_init_unlocked:248  Invalid Channel State [0x0] for ch_type [2]
[    7.495482] pstore: Registered ramoops as persistent store backend
[    7.520175] dce: dce_admin_send_cmd_ver:425  version (dcfw:[0x4] eckmd:[0x4] err: [0x0]
[    7.538197] ramoops 80080000.ram: ecc 0
[    7.563690] dce: dce_admin_setup_clients_ipc:1004  Channel Reset Complete for Type [1] ...
[    7.605787] arm_sme_pmu sme-pmu: probing SPEv1.2 for CPUs 0-13 [max_record_sz 64, align 64, features 0x57]
[    7.605812] irq: IRQ267: trimming hierarchy from :bus@0:pmc@c800000
[    7.605909] dce_system_cfg: Coresight Configuration manager initialised
[    7.606200] dce: dce_admin_setup_clients_ipc:980  Get queue info failed for [2]
[    7.606423] dce: dce_admin_setup_clients_ipc:1004  Channel Reset Complete for Type [3] ...
[    7.609121] uvcvideo: Hypervisor not present
[    7.642966] dce: dce_start_boot_flow:188  DCE_BOOT_DONE
[    7.671294] lm90 5-004c: supply vcc not found, using dummy regulator
[    7.728926] nvsciipc nvscipc: creating nvscipc_uid sysfs group
[    7.765600] nvsciipc nvscipc: nvscipc_uid sysfs group: done
[    7.770232] nvsciipc: loaded module
[    7.774827] nvmap_heap_init: nvmap_heap_init: created heap block cache
[    7.775020] vmap_co_device_init: vpr dma coherent mem declare 0xe0000000@4600000,914358272
[    7.790516] tegra-carveouts tegra-carveouts: assigned reserved memory node vpr-carveout
[    7.790523] nvmap_page_pool_init: Total RAM pages: 32197537
[    7.790611] nvmap_page_pool_init: nvmap page pool size: 4024692 pages (3433 MB)
[    7.790937] nvmap_background_zero_thread: PP zeroing thread starting
[    7.791011] nvmap_heap_create: created heap vpr base 0xe0000000 size (892928KiB)
[    7.791041] nvmap_heap_create: fsi dma coherent mem declare 0x00000001fc400000,16777216
[    7.791047] nvmap_heap_create: created heap fsi base 0x00000001fc400000 size (16384KiB)
[   14.938873] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[   14.948893] 818930000.rtcpu:hsp-vm1: rce: ERROR: RTCPU suspend failed, resetting it
[   14.949258] 818930000.rtcpu:hsp-vm1: rce: ERROR: rce_isp_sus_failed: -71 [isp_pm_handler] "ERROR: Failed to turn isp power off"
[   15.356206] [RCE] ERROR: camera-ip/isp5/isp5.c:2710 [isp_pm_handler] "ERROR: Failed to turn isp power off"
[   16.384657] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[   16.980602] [RCE] ERROR: camera-ip/isp5/isp5.c:2710 [isp_pm_handler] "ERROR: Failed to turn isp power off"
[   17.679757] tegra-mc 81080000.memory-controller: sync_state() pending due to 818120000.host1x
[   17.803393] tegra-mc 81080000.memory-controller: sync_state() pending due to bus@0:aconnect@900000
[   17.806007] tegra-mc 81080000.memory-controller: external-memory-controller@81080000: sync_state() pending due to 818120000.host1x
[   17.813707] tegra-mc 81080000.memory-controller: external-memory-controller@81080000: sync_state() pending due to bus@0:aconnect@900000
[   17.823942] tegra186-emc 81080000.memory-controller: external-memory-controller@81080000: sync_state() pending due to 800900000.hda
[   17.847624] tegra186-emc 81080000.memory-controller: sync_state() pending due to 800800000.display
[   18.869351] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[   18.870174] [RCE] ERROR: camera-ip/vi5/camrtc.c:4446 [nvcsii_pm_handler] "ERROR: Failed to turn nvcsi power off"
[   19.567771] 818930000.rtcpu:hsp-vm1: camrtc_hsp_rx_full_notify: receive CAMRTC_HSP_PANIC message!
[   20.936860] [RCE] ERROR: camera-ip/vi5/vi5.c:8197 [vi5_pm_handler] "ERROR: Failed to turn vi power off"
[   22.972767] [RCE] ERROR: camera-ip/vi5/vi5.c:8197 [vi5_pm_handler] "ERROR: Failed to turn vi power off"

Is there some more info I can provide? Something I can try to fix?

please remove HDMI and also ethernet cable and use UART serial log to check board status.

1 Like