As we can see that the device “reboots” we thought that maybe we could get more info by enabling kernel debugging following these instructions: Kernel Debugging Tools — NVIDIA Jetson Linux Developer Guide 1 documentation
It didn’t help. Last night the device rebooted again and no crash log was generated.
This is the log:
Apr 16 00:04:19 orin-nx-2 kernel: [25924.573599] cpufreq: cpu0,cur:246000,set:1984000,delta:1738000,set ndiv:155
Apr 16 00:04:52 orin-nx-2 kernel: [25957.605512] cpufreq: cpu0,cur:278000,set:1984000,delta:1706000,set ndiv:155
Apr 16 00:04:53 orin-nx-2 kernel: [25958.607464] cpufreq: cpu4,cur:990000,set:1984000,delta:994000,set ndiv:155
Apr 16 00:04:55 orin-nx-2 kernel: [25960.610693] cpufreq: cpu4,cur:749000,set:1984000,delta:1235000,set ndiv:155
Apr 16 00:05:06 orin-nx-2 kernel: [25971.620831] cpufreq: cpu4,cur:246000,set:1984000,delta:1738000,set ndiv:155
Apr 16 00:05:23 orin-nx-2 kernel: [25988.635968] cpufreq: cpu4,cur:1110000,set:1984000,delta:874000,set ndiv:155
Apr 16 00:05:29 orin-nx-2 kernel: [25994.641330] cpufreq: cpu4,cur:248000,set:1984000,delta:1736000,set ndiv:155
Apr 16 00:06:31 orin-nx-2 kernel: [26056.692966] cpufreq: cpu0,cur:1776000,set:1984000,delta:208000,set ndiv:155
Apr 16 00:06:59 orin-nx-2 kernel: [26084.719688] cpufreq: cpu0,cur:1728000,set:1984000,delta:256000,set ndiv:155
Apr 16 00:07:19 orin-nx-2 kernel: [26104.737757] cpufreq: cpu4,cur:1060000,set:1984000,delta:924000,set ndiv:155
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvmap'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Linux version 5.15.148-tegra (root@SWENG5) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2022.08) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Fri Dec 13 12:15:46 EST 2024 ()
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Machine model: CTI Hadron + Orin NX
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] efi: EFI v2.70 by EDK II
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] efi: RTPROP=0x46d7df198 TPMFinalLog=0x45e3a0000 SMBIOS=0xffff0000 SMBIOS 3.0=0x46d1d0000 MEMATTR=0x467112018 ESRT=0x467a7a198 TPMEventLog=0x45e3b8018 RNG=0x45a7e0018 MEMRESERVE=0x45e3bac18
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] random: crng init done
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] secureboot: Secure boot disabled
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] esrt: Reserving ESRT space from 0x0000000467a7a198 to 0x0000000467a7a1d0.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Reserved memory: created CMA memory pool at 0x000000044a000000, size 256 MiB
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: No NUMA configuration found
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NUMA: NODE_DATA [mem 0x4702fc800-0x4702fefff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Zone ranges:
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] DMA32 empty
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Normal [mem 0x0000000100000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvgpu'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Movable zone start for each node
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Early memory node ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000bdffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x00000000c2000000-0x00000000fffdffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x00000000fffe0000-0x00000000ffffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000100000000-0x000000045e1e6fff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e1e7000-0x000000045e3affff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3b0000-0x000000045e3bafff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3bb000-0x000000045e3bbfff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000045e3bc000-0x000000046b89ffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000046b8a0000-0x000000046d7dffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'ina3221'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x000000046d7e0000-0x0000000471dfffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000471e00000-0x0000000471ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000472000000-0x000000047259ffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000472f00000-0x0000000472ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] node 0: [mem 0x0000000476000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_widths Section 0 Node 4 Zone 2 Lastcpupid 16 Kasantag 0 Flags 24
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_shifts Section 21 Node 4 Zone 2 Lastcpupid 16 Kasantag 0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_pgshifts Section 0 Node 60 Zone 58 Lastcpupid 42 Kasantag 0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_nodezoneid Node/Zone ID: 64 -> 58
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::pageflags_layout_usage location: 64 -> 42 layout 42 -> 24 unused 24 -> 0 page-flags
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x0000000477ffffff]
Apr 16 00:16:54 orin-nx-2 systemd-modules-load[272]: Inserted module 'nvidia_p2p'
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::memmap_init Initialising map node 0 zone 0 pfns 524288 -> 1048576
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4685824
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone DMA: 16384 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone Normal: 2400 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] On node 0, zone Normal: 12288 pages in unavailable ranges
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel low memory reserved: 0xf7e00000 - 0xffe00000 (128 MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel reserved: 0x00000003ba200000 - 0x000000043a200000 (2048 MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: probing for conduit method from DT.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: PSCIv1.1 detected in firmware.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: Using standard PSCI v0.2 function IDs
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: Trusted OS migration not required
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] psci: SMC Calling Convention v1.2
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] percpu: Embedded 29 pages/cpu s80408 r8192 d30184 u118784
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] pcpu-alloc: s80408 r8192 d30184 u118784 alloc=29*4096
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Detected PIPT I-cache on CPU0
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Address authentication (architected algorithm)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: GIC system register CPU interface
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Virtualization Host Extensions
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Hardware dirty bit management
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Spectre-v4
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Spectre-BHB
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] alternatives: patching kernel code
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist general 0:DMA = 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist general 0:Normal = 0:Normal 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist thisnode 0:DMA = 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mminit::zonelist thisnode 0:Normal = 0:Normal 0:DMA
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 4065440
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Policy zone: Normal
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Kernel command line: root=PARTUUID=fdfa49c2-fcac-47d2-9f4f-2e3644bf23e7 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 firmware_class.path=/etc/firmware fbcon=map:0 nospectre_bhb video=efifb:off console=tty0 crashkernel=2G bl_prof_dataptr=2031616@0x471E10000 bl_prof_ro_ptr=65536@0x471E00000
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Unknown kernel command line parameters "bl_prof_dataptr=2031616@0x471E10000 bl_prof_ro_ptr=65536@0x471E00000", will be passed to user space.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] software IO TLB: mapped [mem 0x00000000f3e00000-0x00000000f7e00000] (64MB)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Memory: 13516768K/16521856K available (19712K kernel code, 4088K rwdata, 10120K rodata, 7744K init, 697K bss, 2742944K reserved, 262144K cma-reserved)
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] trace event string verifier disabled
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: Preemptible hierarchical RCU implementation.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU event tracing is enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Trampoline variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Rude variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] Tracing variant of Tasks RCU enabled.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
In the log we can see:
Apr 16 00:16:54 orin-nx-2 kernel: [ 0.000000] crashkernel reserved: 0x00000003ba200000 - 0x000000043a200000 (2048 MB)`
which shows that the crash log is enabled. What could cause the system to reboot without a kernel crash?
This might be completely unrelated, but if not, it is worth mentioning. We have thousands of messages cpufreq for the cpu0 and cpu4, setting them to 1984000 before the reboot. After the reboot, there is none.
There is this message further up in the log:
Apr 15 16:57:25 orin-nx-2 kernel: [ 311.852908] cpufreq transition table exceeds PAGE_SIZE. Disabling
Thanks for your help,
Alex