Tried with a Xavier NX Devkit SD-CARD but I’m pretty sure it happens with the production module and with the AGX too.
If I specify the dtb I’m getting this Kernel panic after kexec:
sudo kexec -l /boot/Image --dtb=/boot/tegra194-p3668-all-p3509-0000.dtb --reuse-cmdline --force
��WARNING: at platform/drivers/pg/pg-gpu-t194.c:185
��[ 284.412661] kexec_core: Starting new kernel
[ 284.451055] CPU1: shutdown
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.9.140-tegra (buildbrain@mobile-u64-3357) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecf
bc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #1 SMP PREEMPT Thu Jun 25 21:22:12 PDT 2020
[ 0.000000] Boot CPU: AArch64 Processor [4e0f0040]
[ 0.000000] earlycon: tegra_comb_uart0 at MMIO32 0x000000000c168000 (options '')
[ 0.000000] bootconsole [tegra_comb_uart0] enabled
[ 0.000000] cma: Failed to reserve 64 MiB
[ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x1000 bytes below 0x0.
[ 0.000000]
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.140-tegra #1
[ 0.000000] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 0.000000] Call trace:
[ 0.000000] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
[ 0.000000] [<ffffff800808c37c>] show_stack+0x24/0x30
[ 0.000000] [<ffffff800845c7a0>] dump_stack+0x98/0xc0
[ 0.000000] [<ffffff80081c1438>] panic+0x11c/0x298
[ 0.000000] [<ffffff8009618268>] memblock_alloc_base+0x30/0x3c
[ 0.000000] [<ffffff8009618284>] memblock_alloc+0x10/0x18
[ 0.000000] [<ffffff8009606fc4>] early_pgtable_alloc+0x18/0x70
[ 0.000000] [<ffffff8009607198>] paging_init+0x2c/0x7a4
[ 0.000000] [<ffffff8009603f40>] setup_arch+0x204/0x604
[ 0.000000] [<ffffff8009600858>] start_kernel+0x64/0x384
[ 0.000000] [<ffffff8009600204>] __primary_switched+0x80/0x94
and If I reuse the dtb provided by cboot on the current boot, kernel loads fine but panics when nvgpu gets loaded:
sudo kexec -l /boot/Image --reuse-cmdline --force
[ 7.991162] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[ 8.076322] nvgpu: 17000000.gv11b nvgpu_nvhost_syncpt_init:291 [INFO] syncpt_unit_base 60000000 syncpt_unit_size 400000 size 100
0
[ 8.076322]
[ 8.083720] CPU0: SError detected, daif=140, spsr=0x80000000, mpidr=80000000, esr=be000000
[ 8.083725] CPU1: SError detected, daif=1c0, spsr=0x60c000c5, mpidr=80000001, esr=be000000
[ 8.083730] CPU5: SError detected, daif=140, spsr=0x80400045, mpidr=80000201, esr=be000000
[ 8.083735] CPU4: SError detected, daif=140, spsr=0x60400045, mpidr=80000200, esr=be000000
[ 8.083742] CPU2: SError detected, daif=140, spsr=0x80c00045, mpidr=80000100, esr=be000000
[ 8.083746] CPU3: SError detected, daif=140, spsr=0x20000000, mpidr=80000101, esr=be000000
[ 8.083807] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.083828] **************************************
[ 8.083830] RAS Error in SCF:SNOC, ERRSELR_EL1=1026:
[ 8.083832] Status = 0xfc00a20d
[ 8.083834] IERR = Uncorrectable Carveout Error: 0xa2
[ 8.083836] SERR = Illegal address (software fault): 0xd
[ 8.083837] Overflow (there may be more errors) - Uncorrectable
[ 8.083838] Uncorrectable (this is fatal)
[ 8.083845] MISC0 = 0x804
[ 8.083847] MISC1 = 0xa10900000000
[ 8.083852] ADDR = 0x80000000c6000000
[ 8.083858] **************************************
[ 8.083865] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 8.083905] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 8.083985] Bad mode in Error handler detected on CPU1, code 0xbe000000 -- SError
[ 8.083989] Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
[ 8.084003] Modules linked in: nvgpu bluedroid_pm ip_tables x_tables
[ 8.084012] CPU: 1 PID: 347 Comm: kworker/u12:5 Not tainted 4.9.140-tegra #1
[ 8.084014] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 8.084029] Workqueue: events_unbound call_usermodehelper_exec_work
[ 8.084032] task: ffffffc1f4c4aa00 task.stack: ffffffc1f4d58000
[ 8.084040] PC is at bad_range+0x28/0x70
[ 8.084043] LR is at bad_range+0x28/0x70
[ 8.084046] pc : [<ffffff80081c9d48>] lr : [<ffffff80081c9d48>] pstate: 60c000c5
[ 8.084048] sp : ffffffc1f4d5b6b0
[ 8.084053] x29: ffffffc1f4d5b6b0 x28: 0000000000000008
[ 8.084055] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.084061] x27: 00000000ffffff80 x26: 0000000000000003
[ 8.084065] x25: ffffffbf07b83e00 x24: ffffff800a08f1b8
[ 8.084072] x23: 0000000000000000 x22: ffffffbf07b83c20
[ 8.084076] x21: ffffffbf07b83c00 x20: ffffffbf07b83e00
[ 8.084081] x19: ffffff800a08efc0 x18: 0000000000000000
[ 8.084082] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 8.084088] x17: 000000000000000e x16: 0000000000000000
[ 8.084092] x15: 0000000000000000 x14: 00000000000c8000
[ 8.084097] x13: 0000000000006db7 x12: 0000000000006db7
[ 8.084101] x11: ffffffffffffffff x10: ffffffffffffffff
[ 8.084107] x9 : 0000000000000000 x8 : ffffff800a08f1e8
[ 8.084112] x7 : 0000000000000000 x6 : 0000000000000000
[ 8.084116] x5 : 0000000000180000 x4 : 0000000000100000
[ 8.084121] x3 : 0000000000000001 x2 : 0000000000000000
[ 8.084124] x1 : 000000000026e0f8
[ 8.084125] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 8.084128] x0 : 0000000000000000
[ 8.084133] Process kworker/u12:5 (pid: 347, stack limit = 0xffffffc1f4d58000)
[ 8.084136] Call trace:
[ 8.084139] [<ffffff80081c9d48>] bad_range+0x28/0x70
[ 8.084145] [<ffffff80081cc608>] __rmqueue+0x118/0x718
[ 8.084148] [<ffffff80081cdd88>] get_page_from_freelist+0x770/0xa58
[ 8.084152] [<ffffff80081ce8e4>] __alloc_pages_nodemask+0xfc/0xd38
[ 8.084158] [<ffffff800822f260>] allocate_slab+0xa8/0x4e8
[ 8.084161] [<ffffff800822f6e8>] new_slab+0x48/0x88
[ 8.084165] [<ffffff8008231a8c>] ___slab_alloc.constprop.34+0x2bc/0x4a0
[ 8.084169] [<ffffff8008231cb8>] __slab_alloc.isra.27.constprop.33+0x48/0x60
[ 8.084175] [<ffffff8008231f58>] kmem_cache_alloc+0x288/0x2c0
[ 8.084181] [<ffffff80080b0d74>] copy_process.isra.5.part.6+0x3e4/0x1530
[ 8.084184] [<ffffff80080b205c>] _do_fork+0xd4/0x460
[ 8.084188] [<ffffff80080b2490>] kernel_thread+0x48/0x58
[ 8.084191] [<ffffff80080d1344>] call_usermodehelper_exec_work+0x34/0xd0
[ 8.084196] [<ffffff80080d4ebc>] process_one_work+0x1e4/0x4b0
[ 8.084200] [<ffffff80080d51d8>] worker_thread+0x50/0x4c8
[ 8.084204] [<ffffff80080dbe64>] kthread+0xec/0xf0
[ 8.084209] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[ 8.084212] CPU5: SError detected, daif=140, spsr=0x80400045, mpidr=80000201, esr=be000000
[ 8.084215] ---[ end trace b72d14ba5a5ce893 ]---
[ 8.085562] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.085584] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 8.085625] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 8.085766] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.085790] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 8.085832] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 8.085970] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.085992] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 8.086032] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 8.086115] CPU4: SError detected, daif=1c0, spsr=0xa0c000c5, mpidr=80000200, esr=be000000
[ 8.086120] CPU2: SError detected, daif=140, spsr=0x80c00045, mpidr=80000100, esr=be000000
[ 8.086179] CPU3: SError detected, daif=140, spsr=0x40400045, mpidr=80000101, esr=be000000
[ 8.086208] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 8.086226] **************************************
[ 8.086228] RAS Error in SCF:SNOC, ERRSELR_EL1=1026:
[ 8.086230] Status = 0xfc00a20d
[ 8.086232] IERR = Uncorrectable Carveout Error: 0xa2
[ 8.086234] SERR = Illegal address (software fault): 0xd
[ 8.086236] Overflow (there may be more errors) - Uncorrectable
[ 8.086237] Uncorrectable (this is fatal)
[ 8.086243] MISC0 = 0x804
[ 8.086245] MISC1 = 0x3a10900000000
[ 8.086249] ADDR = 0x80000000c6000080
[ 8.086254] **************************************
I was expecting for a dtb generated by jetson-io, like for instance tegra194-p3668-all-p3509-0000-adafruit-sph0645lm4h.dtb, to work when loaded with kexec but it looks like it doesn’t.
Would be extremely helpful to have these issues fixed so we can have kexec working. Thank you