Possible UEFI memory leak and partition full

Hi All,

some devices with AGX Xavier were not able to run OTA updated after some time, the first investigation was done at Redundant A/B rootfs not switching with set-active-boot-slot but working with set-SR-BR. Where the problem was about system(EDK2) not switching slots.

After further investigation, it seems the partition where UEFI variables are stored, is getting full after rebooting the board around 370 times. Only by running reboot command multiple times.
Every time that the device boot, it create two new variables MTC and PlatformConfigData, it seems to not remove or overwrite the old variables and it is allocating a new space to store the new values.

I started do debug EDK2 and EDK2-nvidia by adding a debug message at this line in order to print CommonVariableSpace where it is always 0x1FF9C and CommonVariableTotalSize, where it has value zero after flashing the board with USB and increase the size every boot.
When the value reach 0x1FF9C it set VarErrorFlag to 0xef. At this time, the system(EDK2) never switch the slots again, nvbootctrl set the runtime variable, but EDK2 isn’t able to do any operation with the variables as it fail to save because memory region is full.

Also during my testes, MAYBE ( I need your confirmation), it could be okay to keep the old variables as it is marking them with State &= VAR_DELETED; at this file and later it runs a garbage collection by calling Reclaim() function.
The issue is that when running FvbWrite(), it sets the variable to LbaBoundaryCrossed = TRUE and return EFI_BAD_BUFFER_SIZE.

I enclose a full log with three boots, the first two it booted okay and last one the error started to occur. After that time EDK2 isn’t able to do any OTA update or even change any variable.
error_with_logs.txt (565.1 KB)

Do you have any idea about? Is it a memory leak? Is it a garbage collector issue?
Thank you in advance for your help.

Hi diogojusten,

Are you using the devkit or custom board for AGX Xavier?
What’s your Jetpack version in use?
What’s the target version for OTA?

Do you mean that some AGX Xavier work as expected but some AGX Xavier not work?

Please share the steps how you perform OTA for us to verify on the devkit.

Hi @KevinFFF,

Answering your questions:

I’ve a custom board with exactly 2888-400-0004-P.0-1-2 Nvidia module.

I tested with JP5.1.2 and JP5.1.3. In both cases I had the issue.

What is exactly this “target version”?. Would does it be the module version 2888-400-0004-P.0-1-2?

The issue isn’t related with OTA. Everything works file including OTA update (before the issue happens). After that, it doesn’t allow to switch Slots and therefore the OTA stops working. There are two ways to simulate the issue.

  1. Rebooting the target around 370 times
  2. Running ~130 OTA update.
    After that time EDK2 start to set 04b37fe8-f6ae-480b-bdd5-37d98c5e89aa-VarErrorFlag to 0xEF and EDK2 isn’t able to switch the Slots anymore.
    In case you want to add the debug message to monitor CommonVariableTotalSize, I apply that DEBUG message as described in this link

I use Mender as a system update, it write the rootfs to the stand-by partition, copy the capsuleUpdate file into ESP partition and set bit2 using oe4t-set-uefi-OSIndications. Just to avoid any miscommunication, only by rebooting the board multiple times, I see the issue, even with ROOTFS_AB disabled.

It seems you were performing OTA update.
So, what’s the base version and the target version?
5.1.2(base) to 5.1.4(target)?

Do you hit the issue in the reboot stress test with 370 times?
Have you also compared the full logs in normal case and the fail case?

Is your issue about this efi variable become 0xEF?

Could you also verify with the latest JP5.1.4(R35.6.0)?

Hi @KevinFFF,

I identified the problem by running OTA update, because the target stopped switching Slots. But to simulate the issue it doesn’t require OTA update neither A/B rootfs enabled. Only by rebooting the target is possible to reproduce the issue.

I tried 5.1.2 to 5.1.2 and 5.1.3 to 5.1.3. That means, I didn’t tried to update from a version to another.

Exactly, only rebooting the device around 370 times, the issue appears. As mentioned in the previous Forum topic, I use Yocto to build our image. But, I also could simulate the issue when running Ubuntu:

root@nvidia-desktop:/home/nvidia# cat /etc/nv_tegra_release 
# R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug  1 19:57:35 UTC 2023

Yes, in a normal operation EDK2 is able to set BootChainFwPrevious, BootChainFwStatus, BootChainFwResetCount and reboot the target to boot from the next slot (Sorry for my multiples debug message):

Jetson UEFI firmware (version v35.6.0 built on 2024-09-17T13:50:43+00:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.
Failed to find memory test protocol
 DIOGO UpdateVariable BootChainFwPrevious attribute 3
DIOGO 10
DIOGO OTHER WRITE 1
DIOGO Offset: 424 , numBytes: 60 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 426 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 484 , numBytes: 28 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 0 , numBytes: 16 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 426 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
 DIOGO UpdateVariable BootChainFwPrevious attribute 3 DONE
 DIOGO UpdateVariable BootChainFwStatus attribute 7
DIOGO 10
DIOGO OTHER WRITE 1
DIOGO Offset: 16 , numBytes: 60 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 18 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 76 , numBytes: 40 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 18 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
 DIOGO UpdateVariable BootChainFwStatus attribute 7 DONE
 DIOGO UpdateVariable BootChainFwResetCount attribute 3
DIOGO 10
DIOGO OTHER WRITE 1
DIOGO Offset: 116 , numBytes: 60 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 118 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 176 , numBytes: 48 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE 1
DIOGO Offset: 118 , numBytes: 1 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
 DIOGO UpdateVariable BootChainFwResetCount attribute 3 DONE
Rebooting to new boot chain
����Shutdown state requested 1

At that time the target reboot and boot from the new Slot.

When the problem is in place, EDK2 isn’t able of changing the variables, set VarErrorFlag with 0xEF and complain about Bad Buffer Size:

...
RecordVarErrorFlag (0xEF) PlatformConfigData:ED3374EF-767B-42FA-AF70-DB520A392822 - 0x00000003 - 0xEA
CommonVariableSpace = 0x1FF9C - CommonVariableTotalSize = 0x1FF8C
 DIOGO UpdateVariable PlatformConfigData attribute 3 DONE
PlatformConfigured: Error setting Platform Config data: Bad Buffer Size
 DIOGO UpdateVariable ConIn attribute 7
 DIOGO UpdateVariable ConIn attribute 7 DONE
 DIOGO UpdateVariable ConIn attribute 7
 DIOGO UpdateVariable ConIn attribute 7 DONE
 DIOGO UpdateVariable ConIn attribute 7
 DIOGO UpdateVariable ConIn attribute 7 DONE
 DIOGO UpdateVariable ConIn attribute 7
 DIOGO UpdateVariable ConIn attribute 7 DONE
 DIOGO UpdateVariable ConIn attribute 7
 DIOGO UpdateVariable ConIn attribute 7 DONE
Jetson UEFI firmware (version v35.6.0 built on 2024-09-17T13:50:43+00:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.
Failed to find memory test protocol
**********************************
**  WARNING: Test Key is used.  **
**********************************
**  WARNING: Test Key is used.  **
...... DIOGO UpdateVariable BootCurrent attribute 6
DIOGO 11
 DIOGO UpdateVariable BootCurrent attribute 6 DONE
DIOGO 12
DIOGO RECLAIM 3
Calling FtwVariableSpace
DIOGO Assert 131000 131000
WRITE 11, Length: 131000
DIOGO OTHER WRITE 1
DIOGO Offset: 72 , numBytes: 131000 , blockSize: 512
DIOGO OTHER WRITE STATUS: 0
DIOGO OTHER WRITE RETURN 4. Current Status: 0
DIOGO OTHER WRITE RETORNANDO ERROR 1
FtwVariableSpace 4: 4

ASSERT_EFI_ERROR (Status = Bad Buffer Size)
ASSERT [VariableRuntimeDxe] /home/diogo/work/as-distro/build/tmp/work/jetson_agx_xavier_syslogic-oe4t-linux/edk2-firmware-tegra/35.6.0-r0/edk2-tegra/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variabl
e.c(3325): !(((INTN)(RETURN_STATUS)(Status)) < 0)
UpdatePcieControllersWithGpuDevice: failed to enumerate GPU device handles: Not Found
...

Yes. When the variable partition is full and EDK2 can’t play with the variables anymore it is setting VarErrorFlag to 0xEF, that indicates the system is in the bad condition and I’m not able to switch slots anymore.

As I use Yocto build and the R35.6.0 isn’t available yet, I can’t do this test. What I did, was to get all EDK2 (edk2-basetools-tegra-native, edk2-firmware-core-tegra,edk2-firmware-tegra, tegra-uefi-capsules) version 35.6.0. That means the EDK2 running in my current test is the R35.6.0, but only the EDK2 stuff.

Bests,
Diogo

Hi,

Sorry to jump in here. Just want to ask one question.

Did you change anything in the UEFI here (either rel-35.4.1 or rel-35.6)? As we know reboot stress test was already conducted by many other users in every release before.

Rel-35.4.1 has its own issue in reboot stress and its own patch to fix issue. But it sounds like you are talking about something new that we didn’t hit before.

For example, if you just flash sdkmanager image to the board, will it also get error with 370 times reboot iterations? And what will the error look like?

Hi @WayneWWW,
Thank you very much for jumping in, any help is very welcome and greatly appreciated.

First of all, I’m using a custom board and I don’t have a “native” devkit for running my tests. Could it be something related to a custom board, that affects only the EDK2 garbage collector?

No. I’m also confuse that nobody faced it before. My reboot script is quite simple. In that post I was doing OTA update, but only by multiples reboot is possible to reproduce too.
I have two test scenarios:

  1. Building an image with Yocto and flashing it. demo-image-base from tegra-demo-distro, branch kirkstone
  2. Using SDKM to downloads the agx-xavier files, applying dtsi and adding some files to rootfs (files provided by the custom board manufacturer).
    In both cases, I tested without AB_ROOTFS, only with single partition and rebooting the target around ~370 times, makesVarErrorFlag turns 0xEF.

Could you please point me to this fix or error information?

Yes, tested with Ubuntu from J5.1.2 too. In this case I didn’t have the EDK2 debug enable as it gets the build/binary ready. When system is health VarErrorFlag=0xFF, after ~370 reboots, VarErrorFlag=0xEF. In this situation (from my debugs previously), EDK2 can’t play with the UEFI variables anymore.
In the case where AB_ROOTFS is enable, EDK2 can’t switch Slots.


A message that caught my eyes is the following:

Ftw: Workspace or Spare block does not exist!
Error: Image at 00879BE8000 start failed: Invalid Parameter
remove-symbol-file /home/diogo/work/as-distro/build/tmp/work/jetson_agx_xavier_syslogic-oe4t-linux/edk2-firmware-tegra/35.6.0-r0/build/Build/Jetson/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/FaultTolerantW

This message is displayed every boot, even when everything is working fine, could it have any relation with UEFI variables garbage collector? By debugging, both variables are zero here.

One question here. What is the exact error that will print if you don’t enable UEFI log in rel-35.4.1?

I just want to check if this is any previous error that reported from other users.

Unfortunately I don’t have the logs when running without EDK2_BUILD_MODE:pn-edk2-firmware-tegra = "DEBUG", but I’ve without my own debug messages and setting gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel = 0x8000004F.

RecordVarErrorFlag (0xEF) PlatformConfigData:ED3374EF-767B-42FA-AF70-DB520A392822 - 0x00000003 - 0xEA
CommonVariableSpace = 0x1FF9C - CommonVariableTotalSize = 0x1FF90
PlatformConfigured: Error setting Platform Config data: Bad Buffer Size
PlatformRegisterConsoles: DevicePathProtocol supported on SimpleTextOutProtocol handle 0x8758A0E18
PlatformRegisterConsoles: DevicePathProtocol supported on SimpleTextOutProtocol handle 0x87589E698
PlatformRegisterConsoles: DevicePathProtocol supported on SimpleTextOutProtocol handle 0x87589B998
PlatformRegisterConsoles: DevicePathProtocol supported on SimpleTextOutProtocol handle 0x875899C18
[Bds]RegisterKeyNotify: 0017/0000 80000000/00 Success
[Bds]RegisterKeyNotify: 0015/0000 80000000/00 Success
[Bds]RegisterKeyNotify: 0000/000D 80000000/00 Success
PROGRESS CODE: V02020000 I0 240612B7-A063-11D4-9A3A-0090273FC14D 760C4038
UsbBusRecursivelyConnectWantedUsbIo: TPL before connect is 4
UsbBusRecursivelyConnectWantedUsbIo: TPL after connect is 4
UsbBusRecursivelyConnectWantedUsbIo: TPL before connect is 4
UsbBusRecursivelyConnectWantedUsbIo: TPL after connect is 4
UsbBusRecursivelyConnectWantedUsbIo: TPL before connect is 4
UsbBusRecursivelyConnectWantedUsbIo: TPL after connect is 4
Jetson UEFI firmware (version v35.5.0 built on 2024-02-26T13:44:31+00:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.
Failed to find memory test protocol
HandleCapsules: processing capsules ...
BootChainExecuteUpdate: Active boot chain=0
BCGetVariable: Read BootChainFwNext=255: Not Found
BCGetVariable: Read AutoUpdateBrBct=1: Success
BootChainExecuteUpdate: Booting OS, FW BootChain=0, Status=-1
**********************************
**  WARNING: Test Key is used.  **
**********************************
**  WARNING: Test Key is used.  **
[Bds]OsIndication: 0000000000000000
[Bds]=============Begin Load Options Dumping ...=============
  Driver Options:
  SysPrep Options:
  Boot Options:
    Boot0001: UEFI 1920GB PCIe Drive 122227200005 1              0x0001
    Boot0002: UEFI eMMC Device           0x0001
    Boot0000: Enter Setup                0x0109
    Boot0003: BootManagerMenuApp                 0x0109
    Boot0004: UEFI Shell                 0x0001
  PlatformRecovery Options:
    PlatformRecovery0000: Default PlatformRecovery               0x0001
[Bds]=============End Load Options Dumping=============
[Bds]BdsWait ...Zzzzzzzzzzzz...
[Bds]BdsWait(5)..Zzzz...
.[Bds]BdsWait(4)..Zzzz...
.[Bds]BdsWait(3)..Zzzz...
.[Bds]BdsWait(2)..Zzzz...
.[Bds]BdsWait(1)..Zzzz...
..[Bds]Exit the waiting!
PROGRESS CODE: V03051007 I0 6D33944A-EC75-4855-A54D-809C75241F6C 760C4038

ASSERT_EFI_ERROR (Status = Bad Buffer Size)
ASSERT [VariableRuntimeDxe] /home/diogo/work/as-distro/build/tmp/work/jetson_agx_xavier_syslogic-oe4t-linux/edk2-firmware-tegra/35.5.0-r0/edk2-tegra/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variabl
e.c(3264): !(((INTN)(RETURN_STATUS)(Status)) < 0)
[Bds]Stop Hotkey Service!
[Bds]UnregisterKeyNotify: 0017/0000 Success
[Bds]UnregisterKeyNotify: 0015/0000 Success
[Bds]UnregisterKeyNotify: 0000/000D Success
UpdatePcieControllersWithGpuDevice: failed to enumerate GPU device handles: Not Found
InstallFdt: Installing Kernel DTB
Eeprom product Ids: 
1. 699-82888-0004-400 P.0 
Processing "L4T Configuration Settings" DTB overlay
Processing node fragment@0 for overlay...

Hi @WayneWWW and @KevinFFF,
By any chance, do you have any news to share with us?
Were you able to reproduce the issue by rebooting the target(AGX Xavier) ~370 times and seeing VarErrorFlag turning to 0xEF? May you share any findings?
Once again, thank you for your help and time.

We are trying to reproduce this issue locally and this needs to take some time.

Hi,

We tried to reboot the device for 600+ times on rel-35.6 but we cannot reproduce the issue you are talking aboug.

Hi @WayneWWW,

Thank you very much for running the tests on your side.

By any change did you run your test with EDK2_BUILD_MODE:pn-edk2-firmware-tegra = "DEBUG"? Could you please shared the UART logs? Unfortunately I only have the custom board and don’t have an AGX Xavier dev kit with me.
Do you think a custom board could create this issue?
Also could you please share the output of efivar -p -n eb704011-1402-11d3-8e77-00a0c969723b-MTC from your device?

From my logs, the only thing that caught my eyes is:

Ftw: Workspace or Spare block does not exist!
Error: Image at 00879BE9000 start failed: Invalid Parameter
remove-symbol-file /home/diogo/work/as-distro/build/tmp/work/jetson_agx_xavier_syslogic-oe4t-linux/edk2-firmware-tegra/35.6.0-r0/build/Build/Jetson/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/FaultTolerantW

Could it have any relation with the issue I’m facing?

Should I use all values 0x0 in the FTW PCDs configuration edk2-nvidia/Platform/NVIDIA/NVIDIA.common.dsc.inc at main · NVIDIA/edk2-nvidia · GitHub?

Bests,

What is the exact base of the system you are using now?

It sounds like you also changed UEFI to rel-35.6 BSP but did you also change the rest of things like optee ?

I don’t think the issue here is related to custom board. What we need you to do here is try to use a unmodified UEFI + bootloader and see if you can reproduce issue but not Yocto.

Hi @WayneWWW ,

I did tests with Yocto and with SDKM + custom dtsi provided by custom board manufacturer.

Yocto version:

  • 35.4.1
  • 35.5.0
  • 35.6.0
    I tested without applying any change, just build it for agx-xavier-devkit and flashing the custom board.

SDKM versions:

  • 35.4.1
    When using SDKM, I can’t flash the board directly, I need to use DTSIs from custom board manufacturer and flash it via command line.

Could you report this to Yocto community and let them report this issue to us?

Unlikely to debug this in this way because we are not able to reproduce this issue locally.

Hi @WayneWWW,
thank you for your replies, let me share my last test result.

It isn’t Yocto related and it isn’t ROOT_AB related. I did two testes using SDKM with versions:

  • 5.1.2
  • 5.1.4

Steps executed:

  1. Used SDKM to download and prepared the target image. Skipped flashing.
  2. Modified kernel/dtb/tegra194-p2888-0001-p2822-0000.dtb in order enabling the HDMI display to be able setting up Ubuntu at first boot:
#changed from 
...
sor2 {
    nvidia,xbar-ctrl = <0x02 0x01 0x00 0x03 0x04>;
...

#changed to
sor2 {
    nvidia,xbar-ctrl = <0x00 0x01 0x02 0x03 0x04>; 
  1. Put board in recovery mode and
  2. Flashed device with sudo ./flash.sh jetson-xavier mmcblk0p1
    After device flashed, configured Ubuntu with default config and user nvidia, Ubuntu restarted and Nvidia stuff were set (e.g. Nvidia wallpaper and shortcuts in the Desktop).

–

Applied the following changes to run my script, on target side:

  1. Ssh to the target and switched to root sudo su
  2. Changed root password to nvidia with passwd
  3. Added PermitRootLogin yes into /etc/ssh/sshd_config and restarted sshd with systemctl restart sshd
  4. Installed efivar with apt update && apt install efivar
  5. Set power to MAX with nvpmodel -m 0

On my PC side:

  1. Configured to login to the target without password
  2. ssh root@192.168.100.149 mkdir -p .ssh
  3. cat ~/.ssh/id_ed25519.pub | ssh root@192.168.100.149 'cat >> .ssh/authorized_keys'

Started script
shutdown_test_ubuntu.txt (888 Bytes)

In both version the efivar 04b37fe8-f6ae-480b-bdd5-37d98c5e89aa-VarErrorFlag turned to 0xef.

  • 5.1.2 after eb704011-1402-11d3-8e77-00a0c969723b-MTC = 37 01 that means 0x0137 (311 reboots)
  • 5.1.4 after eb704011-1402-11d3-8e77-00a0c969723b-MTC = 6d 01 that means 0x016d (365 reboots)

Just double checking, are you sure that efivar -p -n eb704011-1402-11d3-8e77-00a0c969723b-MTC is showing a value bigger than 00000000 6d 01 00 00? You ran the test in a AGX Xavier, right?
Unfortunately I don’t have a dev kit to run the same test, I use a custom board.

  1. Ok, if you only need to enable HDMI setting, may I just ask you to flash jetpack from sdkm directly? You may lose the HDMI function but it sounds not related if you skip the first boot configuration by using l4t_create_default_users.sh. (You need to run it before flashing on your host).

  2. Based on (1), could you try to reproduce issue with your reboot test? Do not enable anything in UEFI, just let the log in release mode.

  3. Yes, we used Xavier AGX .

To flash the board directly from SDKM isn’t working, I don’t know if it has any relation with the custom board. SDKM can detect the board in recovery mode:

When SDKM tries to start flashing, the SDKM terminal shows:

...
09:40:17 INFO: Flash Jetson Linux - flash: [ 10.6367 ] MB2 Applet version 01.00.0000
09:40:17 INFO: Flash Jetson Linux - flash: [ 10.7190 ] 000000003626050d: E> NV3P_SERVER: Could not read eeprom for module cvb.
09:40:17 INFO: Flash Jetson Linux - flash: [ 10.7888 ] tegradevflash_v2 --oem platformdetails eeprom cvb /home/diogo/nvidia/nvidia_sdk/JetPack_5.1.4_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/bbd.bin
09:40:18 INFO: Flash Jetson Linux - flash: [ 10.7904 ] CPU Bootloader is not running on device.
09:40:18 INFO: Flash Jetson Linux - flash: Command tegradevflash_v2 --oem platformdetails eeprom cvb /home/diogo/nvidia/nvidia_sdk/JetPack_5.1.4_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/bbd.bin
09:40:18 INFO: Flash Jetson Linux - flash: [ NV_L4T_FLASH_JETSON_LINUX_COMP Install took 11s ]
09:40:18 ERROR: Flash Jetson Linux - flash: command terminated with error
09:40:18 SUMMARY: DateTime Target Setup - target: Depends on failed component

When exporting the logs, NV_L4T_FLASH_JETSON_LINUX_COMP.log shows:

...
09:40:17.896 - Info:[  10.6340 ] Retrieving EEPROM data
[  10.6341 ] tegrarcm_v2 --oem platformdetails eeprom cvb /home/diogo/nvidia/nvidia_sdk/JetPack_5.1.4_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/bbd.bin
[  10.6367 ] M
B2 Applet versi
on 
01
.0
0
.0
00
0

09:40:17.952 - Info:[  10.7190 ] 0000000
03626050d: E> NV3P_SERVER: Could not read eeprom for module cvb.
09:40:17.997 - Info:[  10.7756 ] 
[  10.7756 ] 
[  10.7888 ] tegradevflash_v2 --oem platformdetails eeprom cvb /home/diogo/nvidia/nvidia_sdk/JetPack_5.1.4_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/bbd.bin
[  10.7904 ] C
PU Bootloader is not running on device.
09:40:18.011 - Info:Error: Return value 4
Command tegradevflash_v2 --oem platformdetails eeprom cvb /home/diogo/nvidia/nvidia_sdk/JetPack_5.1.4_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/bbd.bin

--- Error: Reading board information failed.
09:40:18.019 - Error:[exec_command]: /bin/bash -c /tmp/tmp_NV_L4T_FLASH_JETSON_LINUX_COMP.diogo.sh; [error]: --- Error: Reading board information failed.

09:40:18.019 - Info:[ Component Install Finished with Error ]
09:40:18.019 - Info:[host] [ Disk Avail on Partition : 0.00 B ]
09:40:18.019 - Info:[ NV_L4T_FLASH_JETSON_LINUX_COMP Install took 11s ]

From your request on (1), can I download, prepare image and skip flashing in SDKM, then, to run sudo ./Linux_for_Tegra/tools/l4t_create_default_user.sh -u nvidia -p nvidia -a -n ecu-target and sudo ./flash.sh jetson-xavier mmcblk0p1?

Regard (2), I’m not doing any UEFI change, the only change in the last tests, was the dtb for enabling HDMI.

Just to clarify. You told me you “Used SDKM to download and prepared the target image. Skipped flashing”.

And it sounds like you only modify device tree with something not important (HDMI).

But then you told sdkmanager cannot flash your board.

Are you sure this is the only change you changed from the pure sdkmanager package?
Sdkmanager flash and manual flash are nearly 100% same. Your comment may indicate there are something more there.

If you are sure, I would say you just download the sdkmanager package and run manual flash command.