As we can see that the device “reboots” we thought that maybe we could get more info by enabling kernel debugging following these instructions: Kernel Debugging Tools — NVIDIA Jetson Linux Developer Guide 1 documentation
It didn’t help. Last night the device rebooted again and no crash log was generated.
This is the log:
Apr 16 00:04:19 orin-nx-2 kernel: [25924.573599] cpufreq: cpu0,cur:246000,set:1984000,delta:1738000,set ndiv:155
Apr 16 00:04:52 orin-nx-2 kernel: [25957.605512] cpufreq: cpu0,cur:278000,set:1984000,delta:1706000,set ndiv:155
Apr 16 00:04:53 orin-nx-2 kernel: [25958.607464] cpufreq: cpu4,cur:990000,set:1984000,delta:994000,set ndiv:155
Apr 16 00:04:55 orin-nx-2 kernel: [25960.610693] cpufreq: cpu4,cur:749000,set:1984000,delta:1235000,set ndiv:155
Apr 16 00:05:06 orin-nx-2 kernel: [25971.620831] cpufreq: cpu4,cur:246000,set:1984000,delta:1738000,set ndiv:155
Apr 16 00:05:23 orin-nx-2 kernel: [25988.635968] cpufreq: cpu4,cur:1110000,set:1984000,delta:874000,set ndiv:155
Apr 16 00:05:29 orin-nx-2 kernel: [25994.641330] cpufreq: cpu4,cur:248000,set:1984000,delta:1736000,set ndiv:155
Apr 16 00:06:31 orin-nx-2 kernel: [26056.692966] cpufreq: cpu0,cur:1776000,set:1984000,delta:208000,set ndiv:155
Apr 16 00:06:59 orin-nx-2 kernel: [26084.719688] cpufreq: cpu0,cur:1728000,set:1984000,delta:256000,set ndiv:155
Apr 16 00:07:19 orin-nx-2 kernel: [26104.737757] cpufreq: cpu4,cur:1060000,set:1984000,delta:924000,set ndiv:155
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvmap'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Linux version 5.15.148-tegra (root@SWENG5) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2022.08) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Fri Dec 13 12:15:46 EST 2024 ()
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Machine model: CTI Hadron + Orin NX
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] efi: EFI v2.70 by EDK II
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] efi: RTPROP=0x46d7df198 TPMFinalLog=0x45e3a0000 SMBIOS=0xffff0000 SMBIOS 3.0=0x46d1d0000 MEMATTR=0x467112018 ESRT=0x467a7a198 TPMEventLog=0x45e3b8018 RNG=0x45a7e0018 MEMRESERVE=0x45e3bac18
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] random: crng init done
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] secureboot: Secure boot disabled
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] esrt: Reserving ESRT space from 0x0000000467a7a198 to 0x0000000467a7a1d0.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Reserved memory: created CMA memory pool at 0x000000044a000000, size 256 MiB
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: No NUMA configuration found
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: NODE_DATA [mem 0x4702fc800-0x4702fefff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Zone ranges:
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] DMA32 empty
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Normal [mem 0x0000000100000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvgpu'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Movable zone start for each node
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Early memory node ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000bdffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x00000000c2000000-0x00000000fffdffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x00000000fffe0000-0x00000000ffffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000100000000-0x000000045e1e6fff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e1e7000-0x000000045e3affff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3b0000-0x000000045e3bafff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3bb000-0x000000045e3bbfff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3bc000-0x000000046b89ffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000046b8a0000-0x000000046d7dffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'ina3221'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000046d7e0000-0x0000000471dfffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000471e00000-0x0000000471ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000472000000-0x000000047259ffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000472f00000-0x0000000472ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000476000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_widths Section 0 Node 4 Zone 2 Lastcpupid 16 Kasantag 0 Flags 24
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_shifts Section 21 Node 4 Zone 2 Lastcpupid 16 Kasantag 0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_pgshifts Section 0 Node 60 Zone 58 Lastcpupid 42 Kasantag 0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_nodezoneid Node/Zone ID: 64 -> 58
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_usage location: 64 -> 42 layout 42 -> 24 unused 24 -> 0 page-flags
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvidia_p2p'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::memmap_init Initialising map node 0 zone 0 pfns 524288 -> 1048576
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4685824
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone DMA: 16384 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone Normal: 2400 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone Normal: 12288 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel low memory reserved: 0xf7e00000 - 0xffe00000 (128 MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel reserved: 0x00000003ba200000 - 0x000000043a200000 (2048 MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: probing for conduit method from DT.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: PSCIv1.1 detected in firmware.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: Using standard PSCI v0.2 function IDs
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: Trusted OS migration not required
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: SMC Calling Convention v1.2
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] percpu: Embedded 29 pages/cpu s80408 r8192 d30184 u118784
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] pcpu-alloc: s80408 r8192 d30184 u118784 alloc=29*4096
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Detected PIPT I-cache on CPU0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Address authentication (architected algorithm)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: GIC system register CPU interface
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Virtualization Host Extensions
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Hardware dirty bit management
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Spectre-v4
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Spectre-BHB
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] alternatives: patching kernel code
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist general 0:DMA = 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist general 0:Normal = 0:Normal 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist thisnode 0:DMA = 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 4065440
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Policy zone: Normal
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Kernel command line: root=PARTUUID=fdfa49c2-fcac-47d2-9f4f-2e3644bf23e7 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 firmware_class.path=/etc/firmware fbcon=map:0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G bl_prof_dataptr=2031616@0x471E10000 bl_prof_ro_ptr=65536@0x471E00000
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Unknown kernel command line parameters "bl_prof_dataptr=2031616@0x471E10000 bl_prof_ro_ptr=65536@0x471E00000", will be passed to user space.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] software IO TLB: mapped [mem 0x00000000f3e00000-0x00000000f7e00000] (64MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Memory: 13516768K/16521856K available (19712K kernel code, 4088K rwdata, 10120K rodata, 7744K init, 697K bss, 2742944K reserved, 262144K cma-reserved)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] trace event string verifier disabled
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: Preemptible hierarchical RCU implementation.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU event tracing is enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Trampoline variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Rude variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Tracing variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
In the log we can see:
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel reserved: 0x00000003ba200000 - 0x000000043a200000 (2048 MB)`
which shows that the crash log is enabled. What could cause the system to reboot without a kernel crash?
This might be completely unrelated, but if not, it is worth mentioning. We have thousands of messages cpufreq
for the cpu0
and cpu4
, setting them to 1984000
before the reboot. After the reboot, there is none.
There is this message further up in the log:
Apr 15 16:57:25 orin-nx-2 kernel: [ 311.852908] cpufreq transition table exceeds PAGE_SIZE. Disabling
Thanks for your help,
Alex