Xavier AGX no longer boots into GUI-Destop after crash/reboot

Hello,

Yesterday my AGX rebooted by itself, since then no longer boots into GUI/Desktop and applications (e.g. Python venvs) do not work properly.

kern.logshows following entries after reboot:

Nov 25 14:30:47 dominik-desktop kernel: [  455.769692] ------------[ cut here ]------------
Nov 25 14:30:47 dominik-desktop kernel: [  455.771149] WARNING: CPU: 6 PID: 7762 at /dvs/git/dirty/git-master_linux/kernel/nvgpu/drivers/gpu/nvgpu/common/mm/nvgpu_mem.c:258 nvgpu_mem_wr_n+0xd0/0xe0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.772499] Modules linked in: zram overlay spidev binfmt_misc nvgpu bluedroid_pm ip_tables x_tables
Nov 25 14:30:47 dominik-desktop kernel: [  455.772523] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.772529] CPU: 6 PID: 7762 Comm: vulkaninfo Tainted: G        W       4.9.140-tegra #1
Nov 25 14:30:47 dominik-desktop kernel: [  455.772532] Hardware name: Jetson-AGX (DT)
Nov 25 14:30:47 dominik-desktop kernel: [  455.772536] task: ffffffc7a8c12a00 task.stack: ffffffc7d83b0000
Nov 25 14:30:47 dominik-desktop kernel: [  455.772765] PC is at nvgpu_mem_wr_n+0xd0/0xe0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.772970] LR is at gr_gk20a_load_golden_ctx_image+0x8c/0x2a0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.772974] pc : [<ffffff8000fcf2e8>] lr : [<ffffff8000ffdc4c>] pstate: 00400045
Nov 25 14:30:47 dominik-desktop kernel: [  455.772976] sp : ffffffc7d83b3be0
Nov 25 14:30:47 dominik-desktop kernel: [  455.772979] x29: ffffffc7d83b3be0 x28: ffffff8012343018 
Nov 25 14:30:47 dominik-desktop kernel: [  455.772986] x27: ffffff8001090c90 x26: ffffff8012343000 
Nov 25 14:30:47 dominik-desktop kernel: [  455.772992] x25: ffffff8001090c28 x24: ffffffc7c4748000 
Nov 25 14:30:47 dominik-desktop kernel: [  455.772998] x23: ffffffc7c4740000 x22: ffffff800bc5a000 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773005] x21: 0000000000000001 x20: ffffff8012343018 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773011] x19: 0000000000000000 x18: 0000000000000000 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773017] x17: 0000007f9244c530 x16: ffffff8008272980 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773023] x15: 0000000000000000 x14: 0000000001c378bd 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773030] x13: 000000000000004c x12: 071c71c71c71c71c 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773036] x11: 000000000000000b x10: 0101010101010101 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773043] x9 : fffffffffffffffa x8 : 7f7f7f7f7f7f7f7f 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773049] x7 : fefefeff646c606d x6 : 0000000002209001 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773055] x5 : 0000000000100c80 x4 : 0000000000000001 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773062] x3 : ffffff800bc5a000 x2 : 0000000000000000 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773071] x1 : ffffff8012343018 x0 : ffffff8000ffdc4c 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773076] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.773079] ---[ end trace 5cf0a372f4d1d0d7 ]---
Nov 25 14:30:47 dominik-desktop kernel: [  455.774242] Call trace:
Nov 25 14:30:47 dominik-desktop kernel: [  455.774484] [<ffffff8000fcf2e8>] nvgpu_mem_wr_n+0xd0/0xe0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.774677] [<ffffff8000ffdc4c>] gr_gk20a_load_golden_ctx_image+0x8c/0x2a0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.774868] [<ffffff8000ffff3c>] gk20a_alloc_obj_ctx+0x6b4/0xac0 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.775083] [<ffffff8000fa1178>] gk20a_channel_ioctl+0xaf8/0x1320 [nvgpu]
Nov 25 14:30:47 dominik-desktop kernel: [  455.775091] [<ffffff8008272158>] do_vfs_ioctl+0xb0/0x8d8
Nov 25 14:30:47 dominik-desktop kernel: [  455.775095] [<ffffff8008272a0c>] SyS_ioctl+0x8c/0xa8
Nov 25 14:30:47 dominik-desktop kernel: [  455.775101] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
Nov 25 14:30:47 dominik-desktop kernel: [  455.776455] nvgpu: 17000000.gv11b        gk20a_gr_handle_fecs_error:5294 [ERR]  ctxsw intr0 set by ucode, error_code: 0x00000015
Nov 25 14:30:47 dominik-desktop kernel: [  455.777846] ---- mlocks ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.777894] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.777897] ---- syncpts ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.777907] id 2 (disp_a) min 1 max 1 refs 1 (previous client : )
Nov 25 14:30:47 dominik-desktop kernel: [  455.777911] id 3 (disp_b) min 1 max 1 refs 1 (previous client : )
Nov 25 14:30:47 dominik-desktop kernel: [  455.777920] id 8 (vblank0) min 27225 max -2 refs 1 (previous client : )
Nov 25 14:30:47 dominik-desktop kernel: [  455.777934] id 20 (gv11b_511) min 5 max 6 refs 1 (previous client : gv11b_511)
Nov 25 14:30:47 dominik-desktop kernel: [  455.778524] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778527] ---- channels ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.778547] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778547] channel 2 - 15820000.se
Nov 25 14:30:47 dominik-desktop kernel: [  455.778547] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778550] NvHost basic channel registers:
Nov 25 14:30:47 dominik-desktop kernel: [  455.778555] CMDFIFO_STAT_0:  00002040
Nov 25 14:30:47 dominik-desktop kernel: [  455.778559] CMDFIFO_RDATA_0: 8e4408b8
Nov 25 14:30:47 dominik-desktop kernel: [  455.778565] CMDP_OFFSET_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778569] CMDP_CLASS_0:    00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778572] CHANNELSTAT_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778575] The CDMA sync queue is empty.
Nov 25 14:30:47 dominik-desktop kernel: [  455.778577] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778581] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778581] channel 3 - 15830000.se
Nov 25 14:30:47 dominik-desktop kernel: [  455.778581] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778584] NvHost basic channel registers:
Nov 25 14:30:47 dominik-desktop kernel: [  455.778587] CMDFIFO_STAT_0:  00002040
Nov 25 14:30:47 dominik-desktop kernel: [  455.778591] CMDFIFO_RDATA_0: 0000a400
Nov 25 14:30:47 dominik-desktop kernel: [  455.778595] CMDP_OFFSET_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778598] CMDP_CLASS_0:    00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778601] CHANNELSTAT_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778604] The CDMA sync queue is empty.
Nov 25 14:30:47 dominik-desktop kernel: [  455.778606] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778610] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778610] channel 4 - 15840000.se
Nov 25 14:30:47 dominik-desktop kernel: [  455.778610] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778613] NvHost basic channel registers:
Nov 25 14:30:47 dominik-desktop kernel: [  455.778616] CMDFIFO_STAT_0:  00002040
Nov 25 14:30:47 dominik-desktop kernel: [  455.778619] CMDFIFO_RDATA_0: 04040028
Nov 25 14:30:47 dominik-desktop kernel: [  455.778623] CMDP_OFFSET_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778626] CMDP_CLASS_0:    00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778629] CHANNELSTAT_0:   00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778632] The CDMA sync queue is empty.
Nov 25 14:30:47 dominik-desktop kernel: [  455.778634] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778639] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778639] ---- host general irq ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.778639] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778643] sync_intc0mask = 0x00000001
Nov 25 14:30:47 dominik-desktop kernel: [  455.778646] sync_intmask = 0x50000003
Nov 25 14:30:47 dominik-desktop kernel: [  455.778648] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778648] ---- host syncpt irq mask ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.778648] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778651] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778651] ---- host syncpt irq status ----
Nov 25 14:30:47 dominik-desktop kernel: [  455.778651] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778655] syncpt_thresh_cpu0_int_status(0) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778659] syncpt_thresh_cpu0_int_status(1) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778663] syncpt_thresh_cpu0_int_status(2) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778666] syncpt_thresh_cpu0_int_status(3) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778670] syncpt_thresh_cpu0_int_status(4) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778674] syncpt_thresh_cpu0_int_status(5) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778677] syncpt_thresh_cpu0_int_status(6) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778681] syncpt_thresh_cpu0_int_status(7) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778685] syncpt_thresh_cpu0_int_status(8) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778688] syncpt_thresh_cpu0_int_status(9) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778691] syncpt_thresh_cpu0_int_status(10) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778695] syncpt_thresh_cpu0_int_status(11) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778698] syncpt_thresh_cpu0_int_status(12) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778701] syncpt_thresh_cpu0_int_status(13) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778705] syncpt_thresh_cpu0_int_status(14) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778708] syncpt_thresh_cpu0_int_status(15) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778711] syncpt_thresh_cpu0_int_status(16) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778715] syncpt_thresh_cpu0_int_status(17) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778718] syncpt_thresh_cpu0_int_status(18) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778721] syncpt_thresh_cpu0_int_status(19) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778725] syncpt_thresh_cpu0_int_status(20) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778728] syncpt_thresh_cpu0_int_status(21) = 0x00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778734] gv11b pbdma 0: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778738] id: 0 (tsg), next_id: 0 (tsg) chan status: valid
Nov 25 14:30:47 dominik-desktop kernel: [  455.778754] PBDMA_PUT: 0000001efc020934 PBDMA_GET: 0000001efc020560 GP_PUT: 00000002 GP_GET: 00000001 FETCH: 00000002 HEADER: 800015d0
Nov 25 14:30:47 dominik-desktop kernel: [  455.778754] HDR: 80000574 SHADOW0: fc020000 SHADOW1: 0009341e
Nov 25 14:30:47 dominik-desktop kernel: [  455.778758] gv11b pbdma 1: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778761] id: 144 (tsg), next_id: 1 (tsg) chan status: invalid
Nov 25 14:30:47 dominik-desktop kernel: [  455.778774] PBDMA_PUT: 0000000821880220 PBDMA_GET: 00000065002a4d80 GP_PUT: 00000000 GP_GET: 10080080 FETCH: 00000000 HEADER: 00002010
Nov 25 14:30:47 dominik-desktop kernel: [  455.778774] HDR: 49104900 SHADOW0: 41881820 SHADOW1: 40001148
Nov 25 14:30:47 dominik-desktop kernel: [  455.778778] gv11b pbdma 2: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778781] id: 0 (tsg), next_id: 8 (tsg) chan status: invalid
Nov 25 14:30:47 dominik-desktop kernel: [  455.778794] PBDMA_PUT: 0000001202000950 PBDMA_GET: 0000000422204000 GP_PUT: 00000000 GP_GET: d0c02a40 FETCH: 00000000 HEADER: a0510044
Nov 25 14:30:47 dominik-desktop kernel: [  455.778794] HDR: 01808500 SHADOW0: 3001a804 SHADOW1: 00c0cb10
Nov 25 14:30:47 dominik-desktop kernel: [  455.778796] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778804] gv11b eng 0: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778807] id: 0 (tsg), next_id: 0 (tsg), ctx status: valid 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778809] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778812] gv11b eng 1: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778815] id: 417 (tsg), next_id: 130 (tsg), ctx status: invalid 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778817] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778820] gv11b eng 2: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778823] id: 8 (tsg), next_id: 4 (tsg), ctx status: invalid 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778825] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778828] gv11b eng 3: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778831] id: 16 (tsg), next_id: 1 (tsg), ctx status: invalid 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778833] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778835] 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778883] 511-gv11b, pid 7762, refs: 5: 
Nov 25 14:30:47 dominik-desktop kernel: [  455.778887] channel status:  in use on_pbdma busy
Nov 25 14:30:47 dominik-desktop kernel: [  455.778894] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778894] HEADER: 60400000 COUNT: 00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778894] SEMAPHORE: addr hi: 00000000 addr lo: 00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778894] payload 00000000 execute 00000000
Nov 25 14:30:47 dominik-desktop kernel: [  455.778896] 
Nov 25 14:30:54 dominik-desktop kernel: [  462.675204] tegradc 15200000.nvdisplay: read_edid_into_buffer: extension_blocks = 1, max_ext_blocks = 3
Nov 25 14:30:54 dominik-desktop kernel: [  462.690726] tegradc 15200000.nvdisplay: hdmi_recheck_edid: read_edid_into_buffer() returned 256
Nov 25 14:30:54 dominik-desktop kernel: [  462.690737] tegradc 15200000.nvdisplay: old edid len = 256
Nov 25 14:30:54 dominik-desktop kernel: [  462.690759] tegradc 15200000.nvdisplay: hdmi: No EDID change after HPD bounce, taking no action

On the monitor screen some error messages are shown:

Any idea what the issue might be and where to look next?

Hello,

I am not sure the cause for this, but I have seen similar error messages when I upgraded my AGX to JetPack4.4.1 but the RedHawk kernel was still the one modified from JetPack 4.4.

So, I am wondering if you are still booted into a custom kernel (may be built to add a hardware support) and the AGX got upgraded overnight to the latest JetPack. If this is not the case then I will let someone from NVIDIA to comment.

Thanks for your response. I did not run the do-release-upgrade yet, so it is probably another issue…

Re-read your statement and just remembered that I recently did a apt update&upgrade. I have my system on a NVMe-SSD and forgot to update the /boot folder of the internal eMMC where the first boot-stage is reading data from - see https://github.com/jetsonhacks/rootOnNVMe

After syncing /boot and /lib/modules/kernel now my Xavier is happily running…

1 Like