Hi All,
some devices with AGX Xavier were not able to run OTA updated after some time, the first investigation was done at Redundant A/B rootfs not switching with set-active-boot-slot but working with set-SR-BR. Where the problem was about system(EDK2) not switching slots.
After further investigation, it seems the partition where UEFI variables are stored, is getting full after rebooting the board around 370 times. Only by running reboot
command multiple times.
Every time that the device boot, it create two new variables MTC
and PlatformConfigData
, it seems to not remove or overwrite the old variables and it is allocating a new space to store the new values.
I started do debug EDK2 and EDK2-nvidia by adding a debug message at this line in order to print CommonVariableSpace
where it is always 0x1FF9C
and CommonVariableTotalSize
, where it has value zero after flashing the board with USB and increase the size every boot.
When the value reach 0x1FF9C
it set VarErrorFlag
to 0xef
. At this time, the system(EDK2) never switch the slots again, nvbootctrl set the runtime variable, but EDK2 isn’t able to do any operation with the variables as it fail to save because memory region is full.
Also during my testes, MAYBE ( I need your confirmation), it could be okay to keep the old variables as it is marking them with State &= VAR_DELETED;
at this file and later it runs a garbage collection by calling Reclaim() function.
The issue is that when running FvbWrite(), it sets the variable to LbaBoundaryCrossed = TRUE
and return EFI_BAD_BUFFER_SIZE
.
I enclose a full log with three boots, the first two it booted okay and last one the error started to occur. After that time EDK2 isn’t able to do any OTA update or even change any variable.
error_with_logs.txt (565.1 KB)
Do you have any idea about? Is it a memory leak? Is it a garbage collector issue?
Thank you in advance for your help.