Xavier run jetpack5.0.2 boot fail

I ported jetpack5.0.2 to Xavier. In about 40% of cases, the startup fails. The following is the log when the startup fails. It should be the UEFI execution failure. We used the self-developed bottom board and modified the kernel and device tree.
the log is:
[0000.674] W> RATCHET: MB1 binary ratchet value 4 is larger than ratchet level 2 from HW fuses.
[0000.683] I> MB1 (prd-version: 2.3.0.0-t194-41334769-0a17edc1)
[0000.688] I> Boot-mode: Coldboot
[0000.691] I> Platform: Silicon
[0000.694] I> Chip revision : A02P
[0000.697] I> Bootrom patch version : 15 (correctly patched)
[0000.702] I> ATE fuse revision : 0x200
[0000.705] I> Ram repair fuse : 0x0
[0000.708] I> Ram Code : 0x2
[0000.711] I> rst_source: 0x0, rst_level: 0x0
[0000.716] I> Boot-device: SDMMC (instance: 3)
[0000.732] I> sdmmc DDR50 mode
[0000.736] I> Boot chain mechanism: A/B
[0000.739] I> Current Boot-Chain Slot: 0
[0000.743] I> BR-BCT Boot-Chain: 0, status: 0. update flag: 0
[0000.750] W> PROD_CONFIG: device prod data is empty in MB1 BCT.
[0000.757] I> Temperature = 29500
[0000.760] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.765] W> Skipping boost for clk: BPMP_APB
[0000.769] W> Skipping boost for clk: AXI_CBB
[0000.773] W> Skipping boost for clk: AON_CPU_NIC
[0000.777] W> Skipping boost for clk: CAN1
[0000.781] W> Skipping boost for clk: CAN2
[0000.785] I> Boot-device: SDMMC (instance: 3)
[0000.796] I> Sdmmc: HS400 mode enabled
[0000.800] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.807] W> Thermal config not found in BCT
[0000.815] W> MEMIO rail config not found in BCT
[0000.837] I> sdmmc bdev is already initialized
[0000.882] W> Platform config not found in BCT
[0000.916] I> MB1 done

����main enter
SPE VERSION #: R01.00.18 Created: Jan 29 2021 @ 14:18:27
HW Function test
Start Scheduler.
in late init
��
[0000.925] I> Welcome to MB2(TBoot-BPMP) (version: default.t194-mobile-1ca012e4)
[0000.926] I> DMA Heap @ [0x526fa000 - 0x52ffa000]
[0000.926] I> Default Heap @ [0xd486400 - 0xd48a400]
[0000.927] E> DEVICE_PROD: Invalid value data = 70020000, size = 0.
[0000.933] W> device prod register failed
[0000.937] I> gpio framework initialized
[0000.940] I> tegrabl_gpio_driver_register: register ‘nvidia,tegra194-gpio’ driver
[0000.948] I> tegrabl_gpio_driver_register: register ‘nvidia,tegra194-gpio-aon’ driver
[0000.955] I> No valid sdcard_params in mb1_bct
[0000.960] I> Boot_device: SDMMC_BOOT instance: 3
[0000.964] I> sdmmc-3 params source = boot args
[0000.973] I> sdmmc-3 params source = boot args
[0000.974] I> sdmmc bdev is already initialized
[0001.007] I> Found 20 partitions in SDMMC_BOOT (instance 3)
[0001.022] I> Found 41 partitions in SDMMC_USER (instance 3)
[0001.024] I> Active Boot chain : 0
[0001.081] I> Relocating BR-BCT
[0001.083] > DEVICE_PROD: device prod is not initialized.
[0001.108] E> I2C: slave not found in slaves.
[0001.109] E> I2C: Could not write 0 bytes to slave: 0x00ae with repeat start true.
[0001.110] E> I2C_DEV: Failed to send register address 0x00000000.
[0001.111] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.
[0001.112] E> eeprom: Failed to read I2C slave device
[0001.115] I> Failed to read CVB eeprom data @ AE
[0001.119] I> Retrying CVB eeprom read @ AC …
[0001.193] I> Relocating OP-TEE dtb from: 0x6bfff1d0 to 0x70050000, size: 1008
[0001.194] I> [0] START: 0x80000000, SIZE: 0x2f000000
[0001.195] I> [1] START: 0xaf010000, SIZE: 0x18bf0000
[0001.195] I> [2] START: 0xc7d00000, SIZE: 0xc0000
[0001.196] I> [3] START: 0xca000000, SIZE: 0x800000
[0001.196] I> dram_block larger than 80000000
[0001.198] I> [4] START: 0x100000000, SIZE: 0x780000000
[0001.210] I> Setting NS memory ranges to OP-TEE dtb finished.
[0001.224] I> found decompressor handler: lz4
[0001.719] I> EKB detected (length: 0x410) @ VA:0x526ff400
[0001.720] I> Setting EKB blob info to OPTEE dtb finished.
��NOTICE: BL31: v2.6(release):4fa405dbd
NOTICE: BL31: Built : 20:16:55, Aug 10 2022
I/TC:
��
��I/TC: Non-secure external DT found
��bpmp: init
bpmp: tag is 128431eec76692047e1ac1ebc0392266
sku_dt_init: not sku 0x00
clk_early initialized
mail_early initialized
fuse initialized
hwwdt initialized
t194_ec_get_ec_list: found 45 ecs
ec initialized
vmon_setup_monitors: found 3 monitors
vmon initialized
adc initialized
fmon_populate_monitors: found 73 monitors
fmon initialized
mc initialized
reset initialized
nvhs initialized
uphy_early initialized
emc_early initialized
392 clocks registered
clk initialized
io_dpd initialized
thermal initialized
thermal_mrq initialized
i2c initialized
vrmon_dt_init: vrmon node not found
vrmon_chk_boot_state: found 0 rail monitors
vrmon initialized
regulator initialized
��I/TC: OP-TEE version: 3.16 (gcc vers��avfs_clk_platform initialized
��ion 9.3.0 (��soctherm initialized
��Bu��aotag initialized
��il��powergate initialized
��droot 2020.08)) #2 Thu Aug 11 03:23:20 UTC 2022 aarch64
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/a��dvs initialized
��r��pm initialized
��ch��suspend initialized
��ite��pg_late initialized
��ct��pg_mrq_init initialized
strap initialized
��u��nvl initialized
��re/porting��emc initialized
emc_mrq initialized
��_guidelines.html
I/TC: Primary CPU initializing
��clk_dt initialized
tj_init initialized
uphy_dt initialized
uphy_mrq initialized
uphy initialized
ec_swd_poll_start: 281 reg polling start w period 47 ms
ec_late initialized
hwwdt_late initialized
reset_mrq initialized
ec_mrq initialized
fmon_mrq initialized
clk_mrq initialized
avfs_mrq initialized
mail_mrq initialized
i2c_mrq initialized
tag_mrq initialized
console_mrq initialized
mrq initialized
clk_sync_fmon_post initialized
clk_dt_late initialized
noc_late initialized
pm_post initialized
dbells initialized
dmce initialized
cvc initialized
avfs_clk_mach_post initialized
avfs_clk_platform_post initialized
cvc_late initialized
regulator_post initialized
rm initialized
console_late initialized
clk_dt_post initialized
mc_reg initialized
pg_post initialized
profile initialized
fuse_late initialized
extras_post initialized
bpmp: init complete
entering main console loop
] ��I/TC: Primary CPU switching to normal world boot
��
[0002.371] I> Welcome to NVDisp-Init
[0002.371] I> NVDisp-Init version: t194-f9ecfedc
[0002.372] I> CPU-BL Params @ 0xca020000
[0002.372] I> 0) Base:0x00000000 Size:0x00000000
[0002.372] I> 1) Base:0xc8300000 Size:0x00100000
[0002.373] I> 2) Base:0xc9800000 Size:0x00200000
[0002.373] I> 3) Base:0xc8600000 Size:0x00200000
[0002.375] I> 4) Base:0xc8200000 Size:0x00100000
[0002.380] I> 5) Base:0xc8100000 Size:0x00100000
[0002.384] I> 6) Base:0xc9400000 Size:0x00400000
[0002.389] I> 7) Base:0xc9000000 Size:0x00400000
[0002.393] I> 8) Base:0xc8000000 Size:0x00100000
[0002.398] I> 9) Base:0xc7f00000 Size:0x00100000
[0002.402] I> 10) Base:0xca800000 Size:0x00800000
[0002.407] I> 11) Base:0x40000000 Size:0x00040000
[0002.411] I> 12) Base:0xc7e00000 Size:0x00100000
[0002.416] I> 13) Base:0x40046000 Size:0x00002000
[0002.420] I> 14) Base:0x40048000 Size:0x00002000
[0002.424] I> 15) Base:0xaf000000 Size:0x00004000
[0002.429] I> 16) Base:0x4004a000 Size:0x00002000
[0002.433] I> 17) Base:0xc7c00000 Size:0x00100000
[0002.438] I> 18) Base:0x4004c000 Size:0x00002000
[0002.442] I> 19) Base:0xc9a00000 Size:0x00600000
[0002.447] I> 20) Base:0x4004e000 Size:0x00002000
[0002.451] I> 21) Base:0xc7dc0000 Size:0x0000c000
[0002.456] I> 22) Base:0x00000000 Size:0x00000000
[0002.460] I> 23) Base:0xc7de0000 Size:0x00020000
[0002.465] I> 24) Base:0xcc000000 Size:0x02000000
[0002.469] I> 25) Base:0x40050000 Size:0x00002000
[0002.474] I> 26) Base:0x40040000 Size:0x00006000
[0002.478] I> 27) Base:0xc8c00000 Size:0x00400000
[0002.482] I> 28) Base:0xc8400000 Size:0x00200000
[0002.487] I> 29) Base:0xc8800000 Size:0x00400000
[0002.491] I> 30) Base:0xc7dd0000 Size:0x00010000
[0002.496] I> 31) Base:0x00000000 Size:0x00000000
[0002.500] I> 32) Base:0xf8000000 Size:0x08000000
[0002.505] I> 33) Base:0xce000000 Size:0x2a000000
[0002.509] I> 34) Base:0xcb000000 Size:0x01000000
[0002.514] I> 35) Base:0xae000000 Size:0x01000000
[0002.518] I> 36) Base:0xa0000000 Size:0x0e000000
[0002.523] I> 37) Base:0xca000000 Size:0x00800000
[0002.527] I> 38) Base:0x80000000 Size:0x20000000
[0002.532] I> 39) Base:0xb0000000 Size:0x08000000
[0002.536] I> 40) Base:0x00000000 Size:0x00000000
[0002.540] I> 41) Base:0x00000000 Size:0x00000000
[0002.545] I> 42) Base:0x00000000 Size:0x00000000
[0002.549] I> 43) Base:0x00000000 Size:0x00000000
[0002.554] I> 44) Base:0x00000000 Size:0x00000000
[0002.558] I> 45) Base:0x00000000 Size:0x00000000
[0002.563] GIC-SPI Target CPU: 0
[0002.566] Interrupts Init done
[0002.569] calling constructors
[0002.572] initializing heap
[0002.574] I> Heap: [0xa0960000 … 0xadf00000]
[0002.578] initializing threads
[0002.581] initializing timers
[0002.584] creating bootstrap completion thread
[0002.588] top of bootstrap2()
[0002.591] CPU: MIDR: 0x4E0F0040, MPIDR: 0x80000000
[0002.596] initializing platform
[0002.599] E> DEVICE_PROD: Invalid value data = 0, size = 0.
[0002.604] W> device prod register failed
[0002.608] I> Bl_dtb @0xadf00000
[0002.611] I> gpio framework initialized
[0002.624] I> tegrabl_gpio_driver_register: register ‘nvidia,tegra194-gpio’ driver
[0002.634] I> tegrabl_gpio_driver_register: register ‘nvidia,tegra194-gpio-aon’ driver
[0002.644] I> fixed regulator driver initialized
[0002.671] I> register ‘maxim’ power off handle
[0002.675] I> virtual i2c enabled
[0002.675] I> registered ‘maxim,max20024’ pmic
[0002.676] I> tegrabl_gpio_driver_register: register ‘max20024-gpio’ driver
[0002.676] I> Boot-device: eMMC
[0002.677] I> Boot_device: SDMMC_BOOT instance: 3
[0002.679] I> sdmmc-3 params source = boot args
[0002.680] W> No board IDs available
[0002.681] E> Failed to get board id info!
[0002.683] I> sdmmc bdev is already initialized
[0002.687] I> sdmmc-3 params source = boot args
[0002.694] I> Found 20 partitions in SDMMC_BOOT (instance 3)
[0002.699] I> Found 41 partitions in SDMMC_USER (instance 3)
[0002.725] I> enabling ‘vdd-hdmi-5v0’ regulator
[0002.733] I> regulator ‘vdd-hdmi-5v0’ already enabled
[0002.734] I> hdmi cable connected
[0002.740] W> set volts not configured for ‘vdd-1v0’
[0002.750] W> set volts not configured for ‘vdd-1v8-hs’
[0002.751] I> retrieved tmds range from prod_list_hdmi_soc
[0002.759] E> invalid display type
[0002.767] E> invalid display type
[0002.768] E> cannot find any other nvdisp nodes
[0002.779] E> I2C: Timeout while polling for transfer complete. Last value 0x00000002.
[0002.779] E> I2C: Could not write 0 bytes to slave: 0x00a0 with repeat start true.
[0002.781] E> I2C_DEV: Failed to send register address 0x00000000.
[0002.781] E> I2C_DEV: Could not read 128 registers of size 1 from slave 0xa0 at 0x00000000 via instance 6.
[0002.784] E> could not read edid
[0002.792] I> hdmi_enable, starting HDMI initialisation
[0002.797] I> hdmi_enable, HDMI initialisation complete
[0002.806] initializing target
[0002.807] calling apps_init()
[0002.807] starting app kernel_boot_app
[0002.808] I> Kernel type = Normal

Jetson UEFI firmware (version 1.0-d7fb19b built on 2022-08-10T20:18:13-07:00)

Hello,

Welcome to the NVIDIA Developer forums. You posted in the Cumulus category, this should go in the Jetson forums. I will move it over for you.

hello lixing.gao,

kernel should boot-up according to below logs,

however, it looks an issue for display.
may I know what’s the display monitor you’re using, is it possible to change other device for testing?

for the display issue,
please see-also developer guide, Jetson Xavier platform specific configurations.
thanks

Thank you for your reply. The monitor model we use is sculptor mf156ln, resolution: 1920 * 1080, and screen refresh rate: 60Hz. We use it normally in jp4.4.1. In addition, we unplug the HDMI monitor, and there will also be boot failure

hello lixing.gao,

since it’s a customize board, did you complete the pinmux configurations through the Jetson Module Adaptation and Bring-Up session?
may I also know what’s the modification to the kernel and device tree you’ve done, thanks

We tested the Dell p2719h monitor, but it still failed to boot. I flashed the unmodified image of jp5.0.2 to our self-developed board, but there was still a boot failure problem. This shows that it has nothing to do with the modification of the kernel. It seems that UEFI is stuck and the kernel is not started normally

Sometimes, the following log will be printed:
WARNING @ [platform/drivers/mailbox/mail_routing_layer/mail_routing_layer.c]:timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout

hello lixing.gao,

please have a try to skip display init.
you may refer to $OUT/Linux_for_Tegra/bootloader/nvdisp-init-README.txt for the steps.
thanks

I have skipped the display init stage, and then HDMI does not output. However, there is still a boot failure problem and there is no improvement. I think UEFI is stuck internally. Please help me

hello lixing.gao,

is there any logs able to share for reference?

Sometimes, the following log will be printed:
WARNING @ [platform/drivers/mailbox/mail_routing_layer/mail_routing_layer.c]:timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout
WARNING @ [platform/drivers/mailbox/tmo_link_provider/mail_tmo.c]: mail tmo TX timeout

the last log is:
Jetson UEFI firmware (version 1.0-d7fb19b built on 2022-08-10T20:18:13-07:00)

If the startup is normal, it is the following log:
Jetson UEFI firmware (version 1.0-d7fb19b built on 2022-08-10T20:18:13-07:00)

** WARNING: Test Key is used. **

L4TLauncher: Attempting GRUB Boot
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel…
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services and installing virtual address map…

1 Like

hello lixing.gao,

did you meant it’s stability issue that it sometime hang in UEFI?
what’s the display monitor looks like when it hang in UEFI? can you press Esc for getting into UEFI menu screen?
are you just have warm-reboot cycle for testing it, may I also know what’s the failure rate?

1 Like

About 40% of the cases will fail to boot. If the boot fails, the display does not show anything. The log above is received from the serial port,From the log, the boot is hang in the UEFI phase

hello lixing.gao,

please check Sources and Compilation session, for the wiki page, Home · NVIDIA/edk2-nvidia Wiki · GitHub
please fetch the sources to build the UEFI, there’ll be binary file with debug favor. you may re-flash the UEFI image to enable debug logs.
thanks

Sorry, I forgot to explain that there are two Xavier modules on our board. The two Xavier modules are connected through PCIe X8. Our test shows that if only one Xavier module is on the board, the startup is normal. If two xaviers are on the board at the same time, they will get stuck in the UEFI stage. Is there any solution?

If the PCIe before the two xaviers is removed, the two xaviers can be started normally and there will be no startup failure

Does UEFI scan PCI devices? Can you turn off this function

Hi,
The default L4T release is for single Jetson platform and this use-case may not work properly. So you have two Xavier modules on the custom board. One is run as PCIe EP device. Is this correct?