Segfault in /usr/sbin/nvpmodel

Hi,

I am running the sample root filesystem in a Orin NX 16 installed in the Orin Nano devkit carrier board.

When I boot the jetson, I get the following segfault on the nvpmodel service.

jetson@jetson:~$ sudo systemctl status nvpmodel.service 
[sudo] password for jetson: 
× nvpmodel.service - nvpmodel service
     Loaded: loaded (/etc/systemd/system/nvpmodel.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2024-06-03 16:27:13 CEST; 2min 44s ago
    Process: 709 ExecStart=/etc/systemd/nvpmodel.sh (code=exited, status=139)
   Main PID: 709 (code=exited, status=139)
        CPU: 62ms

Jun 03 16:26:44 jetson systemd[1]: Starting nvpmodel service...
Jun 03 16:26:44 jetson nvpmodel.sh[709]: /etc/systemd/nvpmodel.sh: line 13:   710 Segmentation fault      (core dumped) /usr/sbin/nvpmodel -f /etc/nvpmodel.conf
Jun 03 16:27:13 jetson systemd[1]: nvpmodel.service: Main process exited, code=exited, status=139/n/a
Jun 03 16:27:13 jetson systemd[1]: nvpmodel.service: Failed with result 'exit-code'.
Jun 03 16:27:13 jetson systemd[1]: Failed to start nvpmodel service.

The nvpmodel is set correctly anyway:

$ sudo nvpmodel -q --verbose
NVPM VERB: Config file: /etc/nvpmodel.conf
NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MAXN
0

If I restart manually the service, it’s ok and there is no segfault.

Any idea? maybe a dependency missing in /etc/systemd/system/nvpmodel.service?

Hi tvai,

What’s your Jetpack version in use?

Do you modify the /etc/nvpmodel.conf?
Please share this configuration file for further check.

Hi @KevinFFF,

Sorry I forgot to specify the JP version.
I am using the latest one, 6 (l4t 36.3).
I didn’t modify the conf file, but here it is:

#
# Copyright (c) 2021-2023, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.
#
# FORMAT:
# < PARAM TYPE=PARAM_TYPE NAME=PARAM_NAME >
# ARG1_NAME ARG1_PATH_VAL
# ARG2_NAME ARG2_PATH_VAL
# ...
# This starts a section of PARAM definitions, in which each line
# has the syntax below:
# ARG_NAME ARG_PATH_VAL
# ARG_NAME is a macro name for argument value ARG_PATH_VAL.
# PARAM_TYPE can be FILE, or CLOCK.
#
# < POWER_MODEL ID=id_num NAME=mode_name >
# PARAM1_NAME ARG11_NAME ARG11_VAL
# PARAM1_NAME ARG12_NAME ARG12_VAL
# PARAM2_NAME ARG21_NAME ARG21_VAL
# ...
# This starts a section of POWER_MODEL configurations, followed by
# lines with parameter settings as the format below:
# PARAM_NAME ARG_NAME ARG_VAL
# PARAM_NAME and ARG_NAME are defined in PARAM definition sections.
# ARG_VAL is an integer for PARAM_TYPE of CLOCK, and -1 is taken
# as INT_MAX. ARG_VAL is a string for PARAM_TYPE of FILE.
# This file must contain at least one POWER_MODEL section.
#
# < PM_CONFIG DEFAULT=default_mode >
# This is a mandatory section to specify one of the defined power
# model as the default.

###########################
#                         #
# PARAM DEFINITIONS       #
#                         #
###########################

< PARAM TYPE=FILE NAME=CPU_ONLINE >
CORE_0 /sys/devices/system/cpu/cpu0/online
CORE_1 /sys/devices/system/cpu/cpu1/online
CORE_2 /sys/devices/system/cpu/cpu2/online
CORE_3 /sys/devices/system/cpu/cpu3/online
CORE_4 /sys/devices/system/cpu/cpu4/online
CORE_5 /sys/devices/system/cpu/cpu5/online
CORE_6 /sys/devices/system/cpu/cpu6/online
CORE_7 /sys/devices/system/cpu/cpu7/online

< PARAM TYPE=FILE NAME=FBP_POWER_GATING >
FBP_PG_MASK /sys/devices/gpu.0/fbp_pg_mask
FBP_PG_MASK_KNEXT /sys/devices/platform/gpu.0/fbp_pg_mask

< PARAM TYPE=FILE NAME=TPC_POWER_GATING >
TPC_PG_MASK /sys/devices/gpu.0/tpc_pg_mask
TPC_PG_MASK_KNEXT /sys/devices/platform/gpu.0/tpc_pg_mask

< PARAM TYPE=FILE NAME=GPU_POWER_CONTROL_ENABLE >
GPU_PWR_CNTL_EN /sys/devices/gpu.0/power/control
GPU_PWR_CNTL_EN_KNEXT /sys/devices/platform/gpu.0/power/control

< PARAM TYPE=FILE NAME=GPU_POWER_CONTROL_DISABLE >
GPU_PWR_CNTL_DIS /sys/devices/gpu.0/power/control
GPU_PWR_CNTL_DIS_KNEXT /sys/devices/platform/gpu.0/power/control

< PARAM TYPE=CLOCK NAME=CPU_A78_0 >
FREQ_TABLE /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
MAX_FREQ /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
MIN_FREQ /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
FREQ_TABLE_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
MAX_FREQ_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
MIN_FREQ_KNEXT /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq

< PARAM TYPE=CLOCK NAME=CPU_A78_1 >
FREQ_TABLE /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_frequencies
MAX_FREQ /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq
MIN_FREQ /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq
FREQ_TABLE_KNEXT /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_frequencies
MAX_FREQ_KNEXT /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq
MIN_FREQ_KNEXT /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq

< PARAM TYPE=CLOCK NAME=GPU >
FREQ_TABLE /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/available_frequencies
MAX_FREQ /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/max_freq
MIN_FREQ /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/min_freq
FREQ_TABLE_KNEXT /sys/devices/platform/17000000.gpu/devfreq_dev/available_frequencies
MAX_FREQ_KNEXT /sys/devices/platform/17000000.gpu/devfreq_dev/max_freq
MIN_FREQ_KNEXT /sys/devices/platform/17000000.gpu/devfreq_dev/min_freq

<PARAM TYPE=CLOCK NAME=EMC >
MAX_FREQ /sys/kernel/nvpmodel_emc_cap/emc_iso_cap
MAX_FREQ_KNEXT /sys/kernel/nvpmodel_clk_cap/emc

< PARAM TYPE=CLOCK NAME=DLA0_CORE >
MAX_FREQ /sys/devices/platform/13e40000.host1x/15880000.nvdla0/acm/clk_cap/dla0_core
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/15880000.nvdla0/clk_cap/dla0_core

< PARAM TYPE=CLOCK NAME=DLA0_FALCON >
MAX_FREQ /sys/devices/platform/13e40000.host1x/15880000.nvdla0/acm/clk_cap/dla0_falcon
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/15880000.nvdla0/clk_cap/dla0_falcon

< PARAM TYPE=CLOCK NAME=DLA1_CORE >
MAX_FREQ /sys/devices/platform/13e40000.host1x/158c0000.nvdla1/acm/clk_cap/dla1_core
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/158c0000.nvdla1/clk_cap/dla1_core

< PARAM TYPE=CLOCK NAME=DLA1_FALCON >
MAX_FREQ /sys/devices/platform/13e40000.host1x/158c0000.nvdla1/acm/clk_cap/dla1_falcon
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/158c0000.nvdla1/clk_cap/dla1_falcon

< PARAM TYPE=CLOCK NAME=PVA0_VPS >
MAX_FREQ /sys/devices/platform/13e40000.host1x/16000000.pva0/acm/clk_cap/pva0_vps
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/16000000.pva0/clk_cap/pva0_vps

< PARAM TYPE=CLOCK NAME=PVA0_AXI >
MAX_FREQ /sys/devices/platform/13e40000.host1x/16000000.pva0/acm/clk_cap/pva0_cpu_axi
MAX_FREQ_KNEXT /sys/devices/platform/bus@0/13e00000.host1x/16000000.pva0/clk_cap/pva0_cpu_axi

###########################
#                         #
# POWER_MODEL DEFINITIONS #
#                         #
###########################

# MAXN is the NONE power model to release all constraints
< POWER_MODEL ID=0 NAME=MAXN >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_ONLINE CORE_6 1
CPU_ONLINE CORE_7 1
FBP_POWER_GATING FBP_PG_MASK 2
TPC_POWER_GATING TPC_PG_MASK 240
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
CPU_A78_0 MIN_FREQ 729600
CPU_A78_0 MAX_FREQ -1
CPU_A78_1 MIN_FREQ 729600
CPU_A78_1 MAX_FREQ -1
GPU MIN_FREQ 0
GPU MAX_FREQ -1
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ -1
DLA0_CORE MAX_FREQ -1
DLA1_CORE MAX_FREQ -1
DLA0_FALCON MAX_FREQ -1
DLA1_FALCON MAX_FREQ -1
PVA0_VPS MAX_FREQ -1
PVA0_AXI MAX_FREQ -1


< POWER_MODEL ID=1 NAME=10W >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 0
CPU_ONLINE CORE_5 0
CPU_ONLINE CORE_6 0
CPU_ONLINE CORE_7 0
FBP_POWER_GATING FBP_PG_MASK 2
TPC_POWER_GATING TPC_PG_MASK 252
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
CPU_A78_0 MIN_FREQ 729600
CPU_A78_0 MAX_FREQ 1190400
GPU MIN_FREQ 0
GPU MAX_FREQ 612000000
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ 2133000000
DLA0_CORE MAX_FREQ 153600000
DLA1_CORE MAX_FREQ 115000000
DLA0_FALCON MAX_FREQ 115000000
DLA1_FALCON MAX_FREQ 115000000
PVA0_VPS MAX_FREQ 115000000
PVA0_AXI MAX_FREQ 115000000

< POWER_MODEL ID=2 NAME=15W >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 0
CPU_ONLINE CORE_5 0
CPU_ONLINE CORE_6 0
CPU_ONLINE CORE_7 0
FBP_POWER_GATING FBP_PG_MASK 2
TPC_POWER_GATING TPC_PG_MASK 252
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
CPU_A78_0 MIN_FREQ 729600
CPU_A78_0 MAX_FREQ 1420800
GPU MIN_FREQ 0
GPU MAX_FREQ 612000000
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ -1
DLA0_CORE MAX_FREQ 614400000
DLA1_CORE MAX_FREQ 115000000
DLA0_FALCON MAX_FREQ 294400000
DLA1_FALCON MAX_FREQ 115000000
PVA0_VPS MAX_FREQ 115000000
PVA0_AXI MAX_FREQ 115000000

< POWER_MODEL ID=3 NAME=25W >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_ONLINE CORE_6 1
CPU_ONLINE CORE_7 1
FBP_POWER_GATING FBP_PG_MASK 2
TPC_POWER_GATING TPC_PG_MASK 240
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
CPU_A78_0 MIN_FREQ 729600
CPU_A78_0 MAX_FREQ 1497600
CPU_A78_1 MIN_FREQ 729600
CPU_A78_1 MAX_FREQ 1497600
GPU MIN_FREQ 0
GPU MAX_FREQ 408000000
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ -1
DLA0_CORE MAX_FREQ 614400000
DLA1_CORE MAX_FREQ 614400000
DLA0_FALCON MAX_FREQ 294400000
DLA1_FALCON MAX_FREQ 294400000
PVA0_VPS MAX_FREQ 512000000
PVA0_AXI MAX_FREQ 358400000

# mandatory section to configure the default power mode
< PM_CONFIG DEFAULT=2 >

I also have to say that I installed many packages in the rootfs for my development environment.
I also built the kernel+OOT modules, just to add a few config:

        --module CONFIG_WIREGUARD \
        --enable CONFIG_IP_ADVANCED_ROUTER \
        --enable CONFIG_IP_MULTIPLE_TABLES \
        --module CONFIG_IP_NF_RAW \
        --module CONFIG_IP6_NF_RAW \
        --module CONFIG_NETFILTER_XT_MATCH_CONNMARK

I modified /etc/environment to add /usr/local/cuda/bin: at the beginning of PATH.

I did run sudo nvpmodel -m 0 and have also a systemd service running jetson_clocks (after nvpmodel.service) as described here https://forums.developer.nvidia.com/t/max-performance-for-nx-is-nvpmodel-m2/146847/5:

$ cat /etc/systemd/system/jetson-clocks.service
[Unit]
Description=Jetson Clocks
After=nvpmodel.service

[Service]
Type=oneshot
ExecStart=/bin/bash -c /usr/bin/jetson_clocks

[Install]
WantedBy=multi-user.target

Today I can try to flash the jetson without modifying anything from L4T and sample root fs, it will be better for analysis.

Hi again,

  1. I just run the following commands from scratch:
wget https://developer.nvidia.com/downloads/embedded/l4t/r36_release_v2.0/release/Jetson_Linux_R36.3.0_aarch64.tbz2
wget https://developer.nvidia.com/downloads/embedded/l4t/r36_release_v2.0/release/Tegra_Linux_Sample-Root-Filesystem_R36.3.0_aarch64.tbz2

tar xvf Jetson_Linux_R36.3.0_aarch64.tbz2
sudo tar xvpf Tegra_Linux_Sample-Root-Filesystem_R36.3.0_aarch64.tbz2 -C Linux_for_Tegra/rootfs/
cd Linux_for_Tegra/

sudo ./apply_binaries.sh
#sudo ./tools/l4t_flash_prerequisites.sh # Already did it
sudo ./tools/l4t_create_default_user.sh -u *** -p *** -n *** -a --accept-license

sudo ./tools/kernel_flash/l4t_initrd_flash.sh \
    --external-device nvme0n1p1 \
    -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml \
    -p "-c ./bootloader/generic/cfg/flash_t234_qspi.xml" \
    --showlogs \
    --network usb0 \
    jetson-orin-nano-devkit external

QSPI and NVMe are flashed successfully and system boots.

  1. Checking at first boot, nvpmodel.service runs succesfully.
  2. sudo nvpmodel -m 0, write YES, system reboots.
  3. nvpmodel.service is succesfull.
  4. Create /etc/systemd/system/jetson-clocks.service:
cat << EOF | sudo tee /etc/systemd/system/jetson-clocks.service > /dev/null
[Unit]
Description=Jetson Clocks
After=nvpmodel.service

[Service]
Type=oneshot
ExecStart=/bin/bash -c /usr/bin/jetson_clocks

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable jetson-clocks
sudo reboot
  1. nvpmodel.service is still succesfull…
  2. Install all the dependencies I need (they are all from apt, without modifying anything in /etc/apt).
  3. nvpmodel.service is still succesfull…

Do you have an idea of what I should check/try to isolate the problem?
I suspect the kernel I built… I will try to build it and flash again.

I built the kernel with the same config, and nvpmodel is succesfull again…
I cannot reproduce it from scratch…

If I plug back the other NVMe where it happened, the service fails… I don’t know why.
Maybe I can do rsync of parts of the rootfs until I get the error.

OK, I finally discovered that it’s random!
It mostly fails but sometimes no.

Here are screenshots from 2 videos I made, one with no error, the other one with failure, nothing on the system has changed, only sudo reboot until I get a failure.

I suspect some starting order of systemd services, or kernel modules load order (I’m not an expert of kernel).

Success:


Failure:


@KevinFFF hope it can help.

Thank you first for several trials.

It seems your link are wrong. It should be r36_release_v3.0 rather than r36_release_v2.0

what’s the result when you run “systemctl status nvpmodel.service” at this moment?
Is there any error in /var/log/syslog?

You’re right, it was just because of bad copy/paste, I did download the files with the correct links.

× nvpmodel.service - nvpmodel service
     Loaded: loaded (/etc/systemd/system/nvpmodel.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2024-06-05 09:21:22 CEST; 20s ago
    Process: 701 ExecStart=/etc/systemd/nvpmodel.sh (code=exited, status=139)
   Main PID: 701 (code=exited, status=139)
        CPU: 56ms

Jun 05 09:20:53 jetson systemd[1]: Starting nvpmodel service...
Jun 05 09:20:53 jetson nvpmodel.sh[701]: /etc/systemd/nvpmodel.sh: line 13:   705 Segmentation fault      (core dumped) /usr/sbin/nvpmodel -f /etc/nvpmodel.conf
Jun 05 09:21:22 jetson systemd[1]: nvpmodel.service: Main process exited, code=exited, status=139/n/a
Jun 05 09:21:22 jetson systemd[1]: nvpmodel.service: Failed with result 'exit-code'.
Jun 05 09:21:22 jetson systemd[1]: Failed to start nvpmodel service.

Here is some log before the segfault, I don’t see anything relevant, but I guess you’ll understand more than me the output and maybe you see something interesting.
There is the line before the last line with a failure but don’t know if it’s related.

Jun  5 09:20:53 jetson kernel: [    9.424727] tegra-dce d800000.dce: Adding to iommu group 47
Jun  5 09:20:53 jetson kernel: [    9.425156] nvvrs_pseq 4-003c: NVVRS Vendor ID: 0x9
Jun  5 09:20:53 jetson kernel: [    9.426197] nvvrs_pseq 4-003c: NVVRS Model Rev: 0x82
Jun  5 09:20:53 jetson kernel: [    9.426440] dce: dce_ipc_channel_init:311  Invalid Channel State [0x0] for ch_type [2]
Jun  5 09:20:53 jetson kernel: [    9.426651] dce: tegra_dce_probe:245  Found display consumer device
Jun  5 09:20:53 jetson kernel: [    9.427166] dce: dce_mailbox_set_full_interrupt:157  Intr bit set multiple times for MB : [0x5]
Jun  5 09:20:53 jetson kernel: [    9.427171] dce: dce_mailbox_set_full_interrupt:157  Intr bit set multiple times for MB : [0x5]
Jun  5 09:20:53 jetson kernel: [    9.427386] dce: dce_admin_send_cmd_ver:456  version : [0x3] err : [0x0]
Jun  5 09:20:53 jetson kernel: [    9.427554] dce: dce_mailbox_set_full_interrupt:157  Intr bit set multiple times for MB : [0x1]
Jun  5 09:20:53 jetson kernel: [    9.427560] dce: dce_admin_setup_clients_ipc:585  Channel Reset Complete for Type [1] ...
Jun  5 09:20:53 jetson kernel: [    9.427562] dce: dce_admin_setup_clients_ipc:561  Get queue info failed for [2]
Jun  5 09:20:53 jetson kernel: [    9.427725] dce: dce_mailbox_set_full_interrupt:157  Intr bit set multiple times for MB : [0x2]
Jun  5 09:20:53 jetson kernel: [    9.427731] dce: dce_admin_setup_clients_ipc:585  Channel Reset Complete for Type [3] ...
Jun  5 09:20:53 jetson kernel: [    9.428648] dce: dce_start_boot_flow:166  DCE_BOOT_DONE
Jun  5 09:20:53 jetson kernel: [    9.433104] nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jun  5 09:20:53 jetson kernel: [    9.453085] nvvrs_pseq 4-003c: NVVRS PSEQ probe successful
Jun  5 09:20:53 jetson kernel: [    9.460631] spi-tegra114 3210000.spi: Adding to iommu group 1
Jun  5 09:20:53 jetson kernel: [    9.469559] nv_platform 13800000.display: Adding to iommu group 48
Jun  5 09:20:53 jetson kernel: [    9.470695] platform 13800000.display:nvdisplay-niso: Adding to iommu group 49
Jun  5 09:20:53 jetson kernel: [    9.480299] tegra-hda 3510000.hda: Adding to iommu group 50
Jun  5 09:20:53 jetson kernel: [    9.480910] tegra194-pcie 140a0000.pcie: Link up
Jun  5 09:20:53 jetson kernel: [    9.484225] tegra194-pcie 140a0000.pcie: Link up
Jun  5 09:20:53 jetson kernel: [    9.491926] tegra194-pcie 140a0000.pcie: PCI host bridge to bus 0008:00
Jun  5 09:20:53 jetson kernel: [    9.491934] pci_bus 0008:00: root bus resource [io  0x300000-0x3fffff] (bus address [0x2a100000-0x2a1fffff])
Jun  5 09:20:53 jetson kernel: [    9.491937] pci_bus 0008:00: root bus resource [mem 0x3528000000-0x352fffffff] (bus address [0x40000000-0x47ffffff])
Jun  5 09:20:53 jetson kernel: [    9.491941] pci_bus 0008:00: root bus resource [bus 00-ff]
Jun  5 09:20:53 jetson kernel: [    9.491943] pci_bus 0008:00: root bus resource [mem 0x3240000000-0x3527ffffff pref]
Jun  5 09:20:53 jetson kernel: [    9.491997] pci 0008:00:00.0: [10de:229c] type 01 class 0x060400
Jun  5 09:20:53 jetson kernel: [    9.492150] pci 0008:00:00.0: PME# supported from D0 D3hot
Jun  5 09:20:53 jetson kernel: [    9.496732] pci 0008:01:00.0: [10ec:8168] type 00 class 0x020000
Jun  5 09:20:53 jetson kernel: [    9.496929] pci 0008:01:00.0: reg 0x10: [io  0x0000-0x00ff]
Jun  5 09:20:53 jetson kernel: [    9.497107] pci 0008:01:00.0: reg 0x18: [mem 0x00000000-0x00000fff 64bit]
Jun  5 09:20:53 jetson kernel: [    9.497222] pci 0008:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit]
Jun  5 09:20:53 jetson kernel: [    9.498182] pci 0008:01:00.0: supports D1 D2
Jun  5 09:20:53 jetson kernel: [    9.498183] pci 0008:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
Jun  5 09:20:53 jetson kernel: [    9.522662] spi-tegra114 3230000.spi: Adding to iommu group 1
Jun  5 09:20:53 jetson kernel: [    9.543386] nvgpu: 17000000.gpu                  gk20a_scale_init:541  [INFO]  enabled scaling for GPU
Jun  5 09:20:53 jetson kernel: [    9.543386] 
Jun  5 09:20:53 jetson kernel: [    9.548858] pci 0008:00:00.0: BAR 14: assigned [mem 0x3528000000-0x35280fffff]
Jun  5 09:20:53 jetson kernel: [    9.548868] pci 0008:00:00.0: BAR 13: assigned [io  0x300000-0x300fff]
Jun  5 09:20:53 jetson kernel: [    9.548872] pci 0008:01:00.0: BAR 4: assigned [mem 0x3528000000-0x3528003fff 64bit]
Jun  5 09:20:53 jetson kernel: [    9.549021] pci 0008:01:00.0: BAR 2: assigned [mem 0x3528004000-0x3528004fff 64bit]
Jun  5 09:20:53 jetson kernel: [    9.549112] pci 0008:01:00.0: BAR 0: assigned [io  0x300000-0x3000ff]
Jun  5 09:20:53 jetson kernel: [    9.549140] pci 0008:00:00.0: PCI bridge to [bus 01-ff]
Jun  5 09:20:53 jetson kernel: [    9.549143] pci 0008:00:00.0:   bridge window [io  0x300000-0x300fff]
Jun  5 09:20:53 jetson kernel: [    9.549147] pci 0008:00:00.0:   bridge window [mem 0x3528000000-0x35280fffff]
Jun  5 09:20:53 jetson kernel: [    9.554022] pcieport 0008:00:00.0: Adding to iommu group 46
Jun  5 09:20:53 jetson kernel: [    9.563861] pcieport 0008:00:00.0: PME: Signaling with IRQ 189
Jun  5 09:20:53 jetson kernel: [    9.569526] pcieport 0008:00:00.0: AER: enabled with IRQ 189
Jun  5 09:20:53 jetson kernel: [    9.599392] cfg80211: Loading compiled-in X.509 certificates for regulatory database
Jun  5 09:20:53 jetson kernel: [    9.603780] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Jun  5 09:20:53 jetson kernel: [    9.608891] at24 0-0050: 256 byte 24c02 EEPROM, read-only
Jun  5 09:20:53 jetson kernel: [    9.628326] imx219 9-0010: tegracam sensor driver:imx219_v2.0.6
Jun  5 09:20:53 jetson kernel: [    9.636059] at24 0-0057: 256 byte 24c02 EEPROM, read-only
Jun  5 09:20:53 jetson kernel: [    9.638911] imx219 9-0010: imx219_board_setup: error during i2c read probe (-121)
Jun  5 09:20:53 jetson kernel: [    9.644004] imx219 9-0010: board setup failed
Jun  5 09:20:53 jetson kernel: [    9.644053] imx219: probe of 9-0010 failed with error -121
Jun  5 09:20:53 jetson kernel: [    9.644680] imx219 10-0010: tegracam sensor driver:imx219_v2.0.6
Jun  5 09:20:53 jetson kernel: [    9.653288] input: NVIDIA Jetson Orin NX HDA HDMI/DP,pcm=3 as /devices/platform/bus@0/3510000.hda/sound/card1/input1
Jun  5 09:20:53 jetson kernel: [    9.655206] imx219 10-0010: imx219_board_setup: error during i2c read probe (-121)
Jun  5 09:20:53 jetson kernel: [    9.660294] imx219 10-0010: board setup failed
Jun  5 09:20:53 jetson kernel: [    9.660341] imx219: probe of 10-0010 failed with error -121
Jun  5 09:20:53 jetson kernel: [    9.662023] input: NVIDIA Jetson Orin NX HDA HDMI/DP,pcm=7 as /devices/platform/bus@0/3510000.hda/sound/card1/input2
Jun  5 09:20:53 jetson kernel: [    9.662656] input: NVIDIA Jetson Orin NX HDA HDMI/DP,pcm=8 as /devices/platform/bus@0/3510000.hda/sound/card1/input3
Jun  5 09:20:53 jetson kernel: [    9.663138] input: NVIDIA Jetson Orin NX HDA HDMI/DP,pcm=9 as /devices/platform/bus@0/3510000.hda/sound/card1/input4
Jun  5 09:20:53 jetson avahi-daemon[354]: Server startup complete. Host name is jetson.local. Local service cookie is 4101803772.
Jun  5 09:20:53 jetson kernel: [    9.687394] rtl88x2ce 0001:01:00.0: Adding to iommu group 3
Jun  5 09:20:53 jetson kernel: [    9.687532] rtl88x2ce 0001:01:00.0: enabling device (0000 -> 0003)
Jun  5 09:20:53 jetson kernel: [    9.702839] tegra234-aon c000000.aon: Adding to iommu group 51
Jun  5 09:20:53 jetson kernel: [    9.703284]  c000000.aon:hsp: probed
Jun  5 09:20:53 jetson kernel: [    9.703392] tegra234-aon c000000.aon: init done
Jun  5 09:20:53 jetson kernel: [    9.718524] irq: IRQ247: trimming hierarchy from :bus@0:pmc@c360000
Jun  5 09:20:53 jetson kernel: [    9.720455] fusb301 1-0025: device id: 0x12
Jun  5 09:20:53 jetson kernel: [    9.726476] CAN device driver interface
Jun  5 09:20:53 jetson kernel: [    9.726613] fusb301 1-0025: fusb301_work_handler: int_sts[0x05]
Jun  5 09:20:53 jetson kernel: [    9.727212] fusb301 1-0025: sts[0x1f], type[0x08]
Jun  5 09:20:53 jetson kernel: [    9.727217] fusb301 1-0025: fusb_update_state: 6
Jun  5 09:20:53 jetson kernel: [    9.742198] gic 2a41000.interrupt-controller: GIC IRQ controller registered
Jun  5 09:20:53 jetson kernel: [    9.742449] tegra-aconnect bus@0:aconnect@2900000: Tegra ACONNECT bus registered
Jun  5 09:20:53 jetson kernel: [    9.753205] Bluetooth: Core ver 2.22
Jun  5 09:20:53 jetson kernel: [    9.753262] NET: Registered PF_BLUETOOTH protocol family
Jun  5 09:20:53 jetson kernel: [    9.753264] Bluetooth: HCI device and connection manager initialized
Jun  5 09:20:53 jetson kernel: [    9.753274] Bluetooth: HCI socket layer initialized
Jun  5 09:20:53 jetson kernel: [    9.753278] Bluetooth: L2CAP socket layer initialized
Jun  5 09:20:53 jetson kernel: [    9.753284] Bluetooth: SCO socket layer initialized
Jun  5 09:20:53 jetson kernel: [    9.758590] fusb301 1-0025: toggle_time(0) is not updated
Jun  5 09:20:53 jetson kernel: [    9.758933] fusb301 1-0025: fusb301_set_mode: mode (32)(32)
Jun  5 09:20:53 jetson kernel: [    9.759352] fusb301 1-0025: fusb301_detach: type[0x08] chipstate[0x06]
Jun  5 09:20:53 jetson kernel: [    9.759406] VDD_5V0_SYS: Underflow of regulator enable count
Jun  5 09:20:53 jetson kernel: [    9.759444] fusb301 1-0025: fusb_update_state: 1
Jun  5 09:20:53 jetson kernel: [    9.760251] fusb301 1-0025: mode[0x20], host_cur[0x02], dttime[0x00]
Jun  5 09:20:53 jetson kernel: [    9.764471] TT CAN feature is not supported
Jun  5 09:20:53 jetson kernel: [    9.765229] 	 Message RAM Configuration
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| base addr   |0x0c312000|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| sidfc_flssa |0x00000000|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| xidfc_flesa |0x00000040|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| rxf0c_f0sa  |0x000000c0|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| rxf1c_f1sa  |0x000009c0|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| rxbc_rbsa   |0x000009c0|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| txefc_efsa  |0x000009c0|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| txbc_tbsa   |0x00000a40|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| tmc_tmsa    |0x00000ec0|
Jun  5 09:20:53 jetson kernel: [    9.765229] 	| mram size   |0x00001000|
Jun  5 09:20:53 jetson kernel: [    9.766597] Release 3.2.3 from 09.06.2018
Jun  5 09:20:53 jetson kernel: [    9.767537] net can0: mttcan device registered (regs=00000000158be7ef, irq=201)
Jun  5 09:20:53 jetson systemd-udevd[328]: nvme0n1: Process '/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/nvme0n1' failed with exit code 1.
Jun  5 09:20:53 jetson nvpmodel.sh[701]: /etc/systemd/nvpmodel.sh: line 13:   705 Segmentation fault      (core dumped) /usr/sbin/nvpmodel -f /etc/nvpmodel.conf

I can reproduce this issue on my local setup, please let me check with internal and get back to you once we get the result.

Nice! I’m glad you can reproduce it so you can debug it internally.

Sorry that we are still debugging this issue.
Could you help to share the result of following commands on your board?

$ cat /etc/nv_boot_control.conf
$ cat /etc/nv_tegra_release

Sure, here are the files:

$ cat /etc/nv_boot_control.conf
TNSPEC 3767-300-0000-M.1-1-1-jetson-orin-nano-devkit-
COMPATIBLE_SPEC 3767-000-0000--1--jetson-orin-nano-devkit-
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0
$ cat /etc/nv_tegra_release
# R36 (release), REVISION: 3.0, GCID: 36191598, BOARD: generic, EABI: aarch64, DATE: Mon May  6 17:34:21 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

Just for recall, I’m using an Orin NX on a Orin Nano devkit carrier board, booting on a NVMe SSD.

Have you confirmed that you were using Orin NX 16G(SKU 0000) module?

What’s the failed rate in your case?

Yes, I’m using Orin NX 16GB plugged in the Orin Nano carrier board.

The failed rate is hard to say because now I’m running it without screen, and I stopped looking at the error of the service (yesterday I did looked at it), as it’s working fine and at MAXN. I don’t see any side effects, except the crash of the nvpmodel service at startup.
But for sure it’s failing more than 50-80% of the time (could be even 90%).

I just booted it now, and it failed, yesterday I booted it twice yesterday and both failed.
I discard that it could be hardware problem, because I have 2 Orin NX 16GB mounted on a devkit carrier board, and both give me the same crash.

Also just to recall, I’m using the sample fs, following the steps in the documentation, I’m not doing anything weird, nor have the kernel rebuilt, just original rootfs.

You can just run sudo systemctl status nvpmodel.service in serial console to check the status w/o screen connected.

Currently, we reproduced the issue on Orin Nano 8G module (SKU 0005) with the failed rate about 3 times out of 500 times reboot test. We haven’t reproduced this issue on Orin NX module yet. We are still doing further test and try to find out the root cause.

Yes I know the command, but what I meant was that my work is focused on my project so I don’t look at the status of nvpmodel service every time I boot the jetson.

If you need any check from my side, it’s no problem for me.