Questions regarding PCSI CPU_ON and possible RAS Uncorrectable errors

Hello,

I have questions regarding PSCI CPU_ON and possible RAS Uncorrectable errors.

Here is the current system I am running:

  • Jetson Orin Nano Developer Kit 8GB
  • MB1 (version: 1.4.0.4-t234-54845784-e89ea9bc)
  • MB2 (version: 0.0.0.0-t234-54845784-22833a33)
  • BL31: v2.8(release):e12e3fa93
  • OP-TEE version: 4.2 (gcc version 11.3.0 (Buildroot 2022.08)) #2 Wed Jan 8 01:24:03 UTC 2025 aarch64
  • Jetson UEFI firmware (version 36.4.3-gcid-38968081 built on 2025-01-08T01:18:20+00:00)
  • Linux version 5.15.148-tegra (buildbrain@mobile-u64-6336-d8000) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 202) installed on a NVME drive

The system is installed with the Nvidia SDK manager.

I am developing a simple hypervisor that is loaded with an UEFI application
before starting the OS. The current loading flow is:

  1. Boot into UEFI shell.
  2. Run the UEFI application located in a USB drive that jumps into the
    hypervisor initialization code (still EL2 at this point).
  3. After the hypervisor initialization is done, it sets up EL1 environment and
    jump back to the UEFI loader in EL1.
  4. The control is now returned to the UEFI application (EL1 at the point).
  5. The UEFI application exits back to the UEFI shell.
  6. Start the OS.

The hypervisor mainly sets up:

  • MMU Stage 2 translation (1-to-1 intermediate physical address to real physical address excepted memory reserved by the hypervisor)
  • Virtual GIC
  • SMC call trapping (Need to handle PSCI CPU_SUSPEND and PSCI CPU_ON)

The OS can boot to the login screen most of the time. However, there are times when
RAS Uncorrectable error occurs:

...
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
***RRORR OR:  e***o* r*a****1*s*ndr*m*=**82*0001*
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc04a0040
ERROR:          MISC1 = 0x24c1844000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Excepiio off core
syndrome=0x82000014
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc0424040
ERROR:          MISC1 = 0x741844000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Powering off core

I tried to track down where the problem occurs, and found that it seems to be
from issuing PSCI CPU_ON call.

Since the hypervisor traps SMC calls from the guest. It is responsible for
issuing SMC calls on behalf of the guest. The flow of issuing PSCI CPU_ON is
the following:

  1. Trap SMC call from the guest.
  2. The hypervisor arranges its own PSCI CPU_ON call with a reference to the
    guest’s physical entry address.
  3. The hypervisor issues the PSCI CPU_ON call.
  4. PSCI CPU_ON call returns, and the hypervisor returns the guest finally.
  5. By the time a secondary CPU core starts, the hypervisor initializes itself,
    and jumps to the guset’s physical entry address in EL1.

I add the following patch to the hypervisor code for debugging. Note that
printf() outputs texts to the UART serial.

diff --git a/core/aarch64/smc.c b/core/aarch64/smc.c
--- a/core/aarch64/smc.c
+++ b/core/aarch64/smc.c
@@ -194,6 +194,8 @@ handle_psci_cpu_on (union exception_save
 	g->pa_base = vmm_mem_start_phys ();
 	g->va_base = vmm_mem_start_virt ();
 
+	printf ("Calling CPU_ON\n");
+
 	/* Check for error from SMC call */
 	error = smc_asm_psci_call (r->reg.x0, r->reg.x1,
 				   sym_to_phys (entry_cpu_on),
@@ -205,6 +207,8 @@ handle_psci_cpu_on (union exception_save
 		free (stack);
 	}
 
+	printf ("Done Calling CPU_ON\n");
+
 	/* Return error to the guest */
 	r->reg.x0 = error;
 }

On successful cases, output from the serial looks like the following:

...
Starting a virtual machine...
Processor 0 entering EL1
Shell> fs3:EFI\BOOT\BOOTAA64.efi
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
Calling CPU_ON                      # Output from the hypervisor
Done Calling CPU_ON                 # Output from the hypervisor
Processor 100 entering EL1          # Output from the hypervisor
Calling CPU_ON                      # Output from the hypervisor
Done Calling CPU_ON                 # Output from the hypervisor
Processor 200 entering EL1          # Output from the hypervisor
Calling CPU_ON                      # Output from the hypervisor
Done Calling CPU_ON                 # Output from the hypervisor
Processor 300 entering EL1          # Output from the hypervisor
Calling CPU_ON                      # Output from the hypervisor
Done Calling CPU_ON                 # Output from the hypervisor
Processor 10200 entering EL1        # Output from the hypervisor
Calling CPU_ON                      # Output from the hypervisor
Done Calling CPU_ON                 # Output from the hypervisor
Processor 10300 entering EL1        # Output from the hypervisor
��debugfs initialized
��I/TC: Reserved shared memory is disabled
I/TC: Dynamic shared memory is enabled
I/TC: Normal World virtualization support is disabled
I/TC: Asynchronous notifications are disabled
��[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]
[    0.000000] Linux version 5.15.148-tegra (buildbrain@mobile-u64-6336-d8000) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 202)
[    0.000000] Machine model: NVIDIA Jetson Orin Nano Engineering Reference Developer Kit Super
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: RTPROP=0x26d82f198 TPMFinalLog=0x25e3f0000 SMBIOS=0xffff0000 SMBIOS 3.0=0x26d220000 MEMATTR=0x266cc6018 ESRT=0x267 
[    0.000000] random: crng init done
...

When an error occurs it looks like the following

Starting a virtual machine...
Processor 0 entering EL1
Shell> fs3:EFI\BOOT\BOOTAA64.efi
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
Calling CPU_ON                      # Output from the hypervisor
Done C��ERRORRRORE ce**i*n**e*****1 *y*****e**x******1*****  # Output from the hypervisor is interrupted
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc05e0040
ERROR:          MISC1 = 0x2c1844000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Exwertng  fe sone1 syndrome=0x82000014
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc052c040
ERROR:          MISC1 = 0x2141844000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000000
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Powering off core

As you can see the error occurs almost immediately after returning from the
PSCI CPU_ON call.

The error can be reproduced by rebooting like 7~15 times. It seems to me
that the error is quite random.

I don’t expect this kind of error from PSCI CPU_ON call. I cannot find any
documentation explaining the relationship between PSCI CPU_ON and
RAS Uncorrectable errors. My questions are:

  1. What are conditions that this situation can occur?
  2. Is there a good way to determine whether it is the hypervisor implementation problem or firmware problem?
  3. Do you have any suggestion that could be a workaround?

If you need more information, please don’t hesitate to ask me.

Best Regards
Ake Koomsin

Hi Ake,

Would you hit the RAS error before loading your UEFI application for hypervisor?

Please share your application and the detailed steps/commands you used with us to verify locally.

Hi Kevin,

Would you hit the RAS error before loading your UEFI application for hypervisor?

No, I haven’t seen any RAS error before loading my UEFI application for the
hypervisor.

hv_ras_reproduce_materials.zip (4.2 MB)

I have provided the zip file for reproducing RAS Uncorrectable error. Inside
the zip file, there are:

  1. bitvisor_config
  2. bitvisor_jetson_orin_nano_debug.patch
  3. loadvmm.efi
  4. bitvisor.elf

The first two files are needed if you prefer to build loadvmm.efi and
bitvisor.elf from the source code. loadvmm.efi is the loader UEFI
application. bitvisor.elf is the hypervisor binary. Both loadvmm.efi
and bitvisor.elf are provided for convenience so that you don’t have
to compile from source.

Building the UEFI and the hypervisor from source code for reproducing the problem (optional)

If there is a need to build loadvmm.efi and bitvisor.elf from source, this
is the instruction for building them.

Before following the building instruction, please make sure that you have
AArch64 cross building toolchain, clang, and Mercurial(hg) SCM installed.

  1. Run the following command to clone the source code
hg clone http://hg.code.sf.net/p/bitvisor/code bitvisor
  1. cd bitvisor to change to the bitvisor directory

  2. Assuming that you are at the bitvisor root directory, import
    bitvisor_jetson_orin_nano_debug.patch by running the following command:

hg import /path/to/bitvisor_jetson_orin_nano_debug.patch
  1. Assuming that you are at the bitvisor root directory, import
    bitvisor_config by running the following command:
cp /path/to/bitvisor_config .config
  1. Compile the hypervisor for AArch64 target by the following command:
CROSS_COMPILE=aarch64-linux-gnu- ARCH=aarch64 make

You may need to replace the aarch64-linux-gnu- prefix with the one you
have on your system.

You should see the text like the following during compiling:

ake@ake_workstation ~/b/bitvisor (default)> CROSS_COMPILE=aarch64-none-linux-gnu- ARCH=aarch64 make
cp defconfig.tmpl defconfig
make -s -f Makefile.config check-old-config V=0
make -s -f Makefile.config check-empty-config V=0
make -s -f Makefile build-all
  CC core/aarch64/acc_emu.o
  CC core/aarch64/acc_emu_asm.o
  CC core/aarch64/acpi.o
...
  LD output.o
  LD bitvisor.elf
/usr/libexec/gcc/aarch64-none-linux-gnu/ld: warning: bitvisor.elf has a LOAD segment with RWX permissions
   text    data     bss     dec     hex filename
1732525 2503232 18681856        22917613        15db1ed bitvisor.elf
  1. The binary output bitvisor.elf is on the bitvisor root directory.
    Confirm that the binary is for AArch64 by running the following command:
file bitvisor.elf
  1. The output from the file command on bitvisor.elf should look like the
    following:
bitvisor.elf: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, with debug_info, not stripped
  1. Assuming that you are at the bitvisor root directory, change directory to
    boot/uefi-loader by running cd boot/uefi-loader

  2. Compile the loader UEFI application for AArch64 target by the following
    command:

CROSS_COMPILE=aarch64-linux-gnu- ARCH=aarch64 make

You may need to replace the aarch64-linux-gnu- prefix with the one on you
system.

You should see the following text:

ake@ake_workstation ~/b/b/b/uefi-loader (default)> CROSS_COMPILE=aarch64-none-linux-gnu- ARCH=aarch64 make
clang -nostdinc -O -ffreestanding -fno-builtin -fno-stack-protector -fno-strict-aliasing -fno-PIE -MMD -Wno-microsoft-static-assert -target aarch64-unknown-windows -march=armv8-a -mgeneral-regs-only -mstrict-align -I../../edk2/MdePkg/Include/AArch64 -I. -I../../include/share -I../../edk2/MdePkg/Include -c bsdriver.c -o bsdriver.o
clang -nostdlib -fuse-ld=lld -Wl,-dll -Wl,-nodefaultlib -target aarch64-unknown-windows -Wl,-entry:EfiDriverEntryPoint -Wl,-subsystem:efi_boot_service_driver -o bsdriver.efi bsdriver.o
aarch64-none-linux-gnu-strip bsdriver.efi
clang -nostdinc -O -ffreestanding -fno-builtin -fno-stack-protector -fno-strict-aliasing -fno-PIE -MMD -Wno-microsoft-static-assert -target aarch64-unknown-windows -march=armv8-a -mgeneral-regs-only -mstrict-align -I../../edk2/MdePkg/Include/AArch64 -I. -I../../include/share -I../../edk2/MdePkg/Include -c loadvmm.c -o loadvmm.o
clang -nostdlib -fuse-ld=lld -Wl,-dll -Wl,-nodefaultlib -target aarch64-unknown-windows -Wl,-entry:efi_main -Wl,-subsystem:efi_application -o loadvmm.efi loadvmm.o
aarch64-none-linux-gnu-strip loadvmm.efi
  1. The output loadvmm.efi at boot/uefi-loader. To confirm that the loader
    is for AArch64, run the following command:
file loadvmm.efi
  1. The output from the file command on loadvmm.efi should look like the
    following:
loadvmm.efi: PE32+ executable (DLL) (EFI application) Aarch64 (stripped to external PDB), for MS Windows, 4 sections

Now you have both bitvisor.elf and loadvmm.efi.

This is the end of the building instruction.

Reproducing RAS Uncorrectable steps

  1. Put loadvmm.efi and bitvisor.elf in a FAT32 USB drive. They must be in
    the same directory.

  2. Plug the USB drive into the Jetson Orin Nano Developer Kit

  3. You may plug USB keyboard/mouse and display as you like

  4. Connect debug serial to the host computer for seeing log output

  5. Plug the power cable to power on the device

  6. Press F11 to enter Boot Manager Menu

  7. Choose “UEFI Shell”

  8. You may need to determine where the USB drive is mapped to and where Linux
    EFI directory is mapped to. With normal installation, Linux EFI directory
    should be mapped to FS3 and the USB drive is mapped to FS4

  9. Assume the USB drive is at FS4: Type the following command to enter FS4:

fs4:
  1. You can use cd command to go the location where you put loadvmm.efi and
    bitvisor.elf in the USB drive. Once you are at the location, type the following to
    load the hypervisor:
loadvmm.efi

and press enter.

  1. You should see the following output from the debug serial:
FS4:\> loadvmm.efi
Loading ...........................................................
serial_init(): done?
ACPI RSDP not found.
SMCC_VERSION 1.2
psci: version 1.1
psci: CPU_SUSPEND OS-initiated support 0 ext format 1
model: NVIDIA Jetson Orin Nano Engineering Reference Developer Kit Super
Scanning PCI segment 8 resources
dt_extract_pcie_info_with_compat(): res 0x3240000000->0x3240000000 code 0x3
dt_extract_pcie_info_with_compat(): res 0x40000000->0x3528000000 code 0x2
dt_extract_pcie_info_with_compat(): res 0x2A100000->0x2A100000 code 0x1
Scanning PCI segment 1 resources
dt_extract_pcie_info_with_compat(): res 0x2080000000->0x2080000000 code 0x3
dt_extract_pcie_info_with_compat(): res 0x40000000->0x20A8000000 code 0x2
dt_extract_pcie_info_with_compat(): res 0x30100000->0x30100000 code 0x1
Scanning PCI segment 4 resources
dt_extract_pcie_info_with_compat(): res 0x2140000000->0x2140000000 code 0x3
dt_extract_pcie_info_with_compat(): res 0x40000000->0x2428000000 code 0x2
dt_extract_pcie_info_with_compat(): res 0x36100000->0x36100000 code 0x1
Scanning PCI segment 7 resources
dt_extract_pcie_info_with_compat(): res 0x3000000000->0x3000000000 code 0x3
dt_extract_pcie_info_with_compat(): res 0x40000000->0x3228000000 code 0x2
dt_extract_pcie_info_with_compat(): res 0x3E100000->0x3E100000 code 0x1
GICD base 0xF400000
GICD total INTID 65536
GICD n_lpis 0
GICD can_use_group0 0
Loading drivers...
idman (user) init
ready
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project.  All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
VPN for RealTek RTL8169 registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel VGA controller driver registered
Initialization done
Starting a virtual machine...
Processor 0 entering EL1
FS4:\>
  1. Given that Linux EFI directory is at fs3, run Linux by typing the
    following:
fs3:EFI\BOOT\BOOTAA64.efi

and press enter.

  1. Most of the time, you should see the following from the debug serial:
FS4:\> fs3:EFI\BOOT\BOOTAA64.efi
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
Calling CPU_ON
Done Calling CPU_ON
Processor 100 entering EL1
Calling CPU_ON
Done Calling CPU_ON
Processor 200 entering EL1
Calling CPU_ON
Done Calling CPU_ON
Processor 300 entering EL1
Calling CPU_ON
Done Calling CPU_ON
Processor 10200 entering EL1
Calling CPU_ON
Done Calling CPU_ON
Processor 10300 entering EL1
��debugfs initialized
��I/TC: Reserved shared memory is disabled
I/TC: Dynamic shared memory is enabled
I/TC: Normal World virtualization support is disabled
I/TC: Asynchronous notifications are disabled
��[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]
[    0.000000] Linux version 5.15.148-tegra (buildbrain@mobile-u64-6336-d8000) (aarch64-buildroot-l)
[    0.000000] Machine model: NVIDIA Jetson Orin Nano Engineering Reference Developer Kit Super
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: RTPROP=0x26d82f198 TPMFinalLog=0x25e3f0000 SMBIOS=0xffff0000 SMBIOS 3.0=0x26d22 
[    0.000000] random: crng init done
[    0.000000] secureboot: Secure boot disabled
...
  1. At this point, you should reach the login screen. Reboot and repeat the
    instruction from step 6 until an error occurs at step 13. When an error
    occurs at step 13, it looks like the following:
FS4:\> fs3:EFI\BOOT\BOOTAA64.efi
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
Calling CPU_ON
Done C��ERRERRO :Ex eption*rea*on=1 sy*dro**=0x**0000*0
*
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc4520040
ERROR:          MISC1 = 0x164c870000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000200
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000200
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
 syndrome=0x82000010e sore
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xe4000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          MISC0 = 0xc456c040
ERROR:          MISC1 = 0x4c870000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000000000200
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000000000200
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Powering off core

It can take some time for the error to occur. For example, When I am verifying
this instruction steps, I have to repeat step-6 to step-13 35 times until I can
see the error.

At step 10, if you see nothing after running loadvmm.efi, just restart.
Chances are that there is a rare bug in either the loader or the hypervisor.
It is not related to this RAS Uncorrectable error.

If you find that some instructions are uncleared, need more information, or
have some insights, please let me know.

Best Regards
Ake Koomsin

Thanks for sharing the detailed reproduce steps.
What you are dong is fine, but seems like some issue with getting secondary cores up.
The errors are coming from ATF.

Do you mean the issue occurs about 3%?
Does a reboot can help to skip the issue?

Thank you very much for the update.

Do you mean the issue occurs about 3%?

The error is quite random. I just want to say that it can take time
to reproduce the problem.

Does a reboot can help to skip the issue?

The reboot can skip the issue. However, the error can randomly
occur again later.

Best Regards
Ake Koomsin