Hi,
We are testing our customized board with both customized R35.5.0 (Jetpack 5.1.3) BSP and R35.4.1 (Jetpack 5.1.2) BSP.
It was observed that certain NVMe drives (Innodisk 4TG2-P) were unable to boot into the Ubuntu OS with R35.5.0, an issue not encountered with R35.4.1.
We tested a total of five NVMe drives (three Innodisk 4TG2-P and two WD SN550) and found that two of the Innodisk 4TG2-P drives exhibited this issue.
From the following experiments, it is evident that the issue is related to differences in the bootloader:
NOTE1: The problematic NVMe will be referred to as “BAD NVMe ”.
NOTE2: Content within BAD NVMe is fixed as the rootfs of R35.5.0.
[Case 1] R35.5.0 QSPI + BAD NVMe
Result:
Unable to boot into Ubuntu OS, although the NVMe boot partition is recognized in the UEFI (We can see NVMe in the boot menu).
Test Steps:
Perform a full flash with R35.5.0 using the following command:
sudo ./tools/kernel_flash/l4t_initrd_flash_pw.sh --external-device nvme0n1p1 \
-c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \
--showlogs --network usb0 pjai-100onox internal
Power on the board
[Case 2] R35.4.1 QSPI + BAD NVMe
Result:
Boots successfully into Ubuntu OS.
Test Steps:
Perform a full flash with R35.5.0 using the following command:
sudo ./tools/kernel_flash/l4t_initrd_flash_pw.sh --external-device nvme0n1p1 \
-c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \
--showlogs --network usb0 pjai-100onox internal
Flash only QSPI flash in R35.4.1 via following command:
sudo ./flash.sh -k A_cpu-bootloader -c bootloader/t186ref/cfg/flash_t234_qspi.xml pjai-100onox nvme0n1p1
Power on the board
Here are the boot logs for both test cases:
[Case 1] fail_boot_l4t_35_5_0_ox8g_inno_verbose.log (393.3 KB)
[Case 2] normal_boot_l4t_35_4_1_ox8g_inno_verbose.log (432.4 KB)
Other boot logs for working NVMe:
[Innodisk 4TG2-P]normal_boot_l4t_35_5_0_ox8g_inno_verbose.log (456.7 KB)
[WD SN550] normal_boot_l4t_35_5_0_ox8g_wd_verbose.log (456.0 KB)
Could you please help to check this issue? Thank you.
你可不可以用中文說一下這一段想表達什麼
bootloader difference是什麼意思?還有NOTE2我也看不太懂
Hi,
我們以前也有人報過這條Innodisk 4TG2-P在Jetson上有問題
如果只有特定牌子會這樣的話應該是SSD firmware有問題
建議你檢查一下有問題的那兩條和沒問題的那一條firmware版本一不一樣
Hi DaveYYY,
Bootloader difference指的是R35.4.1與R35.5.0 QSPI內bootloader binary的差異,目前不確定是MB1、MB2或者UEFI中的哪一個影響到NVMe boot。
BAD NVMe指的是搭R35.5.0無法開進Ubuntu OS的NVMe,若將module上的QSPI flash更新回R35.4.1(不重燒NVMe)則可開進Ubuntu OS。
比較奇怪的是,同一個NVMe,為何搭R35.4.1可開,但搭R35.5.0卻不能開?如果是相容性問題的話,一般是兩個版本都無法開才對
我會再比對一下這幾條Innodisk的Firmware版本,謝謝
你可以試試看在35.4.1上用其他device開機(可能USB drive)
插上那條SSD之後跑點stress test
之前的客戶是說在35.4.1/5.1.2上這樣做也會有IO error
開機不會中可能是碰巧 或者bootloader的差異 所以剛好沒有戳到會有問題的IO操作
了解,我會再做一些stress test確認穩定性
目前查到問題跟/dev/nvme0n1p10(RECROOTFS)這個partition的內容有關
部分NVMe燒錄後,nvme0n1p10會被UEFI掛載並執行其內部的BOOTAA64.efi(正確的BOOTAA64.efi是放在nvme0n1p11),若手動清除nvme0n1p10內容,即可正常開進OS
目前還在研究為何部分NVMe的nvme0n1p10也會有BOOTAA64.efi,以及為何R35.4.1搭配這類NVMe也能正常開進OS
kayccc
June 19, 2024, 2:26am
10
Is this still an issue to support? Any result can be shared?
I modified “l4t_flash_from_kernel.sh” to clean the unused partition.
--- l4t_flash_from_kernel.sh 2024-02-20 12:38:14.252590000 +0800
+++ l4t_flash_from_kernel_new.sh 2024-06-05 09:14:04.872801000 +0800
@@ -734,6 +734,7 @@
local partition
local start_sector
local disk
+ local partition_blk_size
device_type=$(echo "${item}" | cut -d, -f 2 | sed 's/^ //g' - | cut -d: -f 1)
part_name=$(echo "${item}" | cut -d, -f 2 | sed 's/^ //g' - | cut -d: -f 3)
file_name=$(echo "${item}" | cut -d, -f 5 | sed 's/^ //g' -)
@@ -744,8 +745,30 @@
local res=0
if [ -z "${file_name}" ];then
- print_log "Warning: skip writing ${part_name} partition as no image \
-is specified"
+ if [ "${device_type}" = "${EXTERNAL_STORAGE_DEVICE}" ]; then
+ print_log "Warning: No image is specified for ${part_name} partition"
+ print_log "Warning: Try to clean this partition"
+ if [ -n "${count}" ] && [ "${count}" -ne 0 ]; then
+ partition=$(get_partition "${external_device}" "${count}")
+ echo "Get size of partition through connection."
+ # For host mode, the connection might get reset. Therefore, if it fails,
+ # need to do this to wait until the conenction is reestablished
+ wait_for_block_dev "${partition}"
+ pblksz=$(blockdev --getpbsz "/dev/${partition}")
+ chkerr "Get size of partition failed"
+ disk="$(get_disk_name "${external_device}")"
+ start_sector=$(cat "/sys/block/${disk}/${partition}/start")
+ partition_blk_size=$(cat "/sys/block/${disk}/${partition}/size")
+ chkerr "Get start sector of partition failed"
+ fi
+ echo "dd if=/dev/zero of=${part_name} seek=${start_sector} bs=${pblksz} count=${partition_blk_size}"
+ dd if=/dev/zero of=${part_name} seek=${start_sector} bs=${pblksz} count=${partition_blk_size}
+ res="${?}"
+ echo "Clean ${part_name} partition done"
+ return "${res}"
+ else
+ print_log "Warning: skip writing ${part_name} partition as no image is specified"
+ fi
return 0
fi
system
Closed
July 17, 2024, 7:12am
13
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.