Description
When creating a Windows virtual machine on NVIDIA DGX Spark using KVM/QEMU, the Windows VM does not start reliably after the DGX Spark host is rebooted.
After a host reboot, the Windows VM service shows non-deterministic startup behavior:
- Sometimes the Windows VM starts successfully.
- Sometimes it fails during the UEFI boot stage with one of the following messages.
Error Case 1
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
Error Case 2
BdsDxe: loading Boot0008 "Windows Boot Manager" from HD(1,GPT,75101D2F-2B98-494A-86EE-AB33E5F98446,0x800,0x64000)/\EFI\Microsoft\Boot\bootmgfw.efi
BdsDxe: starting Boot0008 "Windows Boot Manager" from HD(1,GPT,75101D2F-2B98-494A-86EE-AB33E5F98446,0x800,0x64000)/\EFI\Microsoft\Boot\bootmgfw.efi
VM Startup Script
#!/bin/bash
echo "Starting Windows VM at $(date)"
exec /usr/bin/qemu-system-aarch64 \
-name windows-arm-vm \
-M virt,gic-version=3 \
-accel kvm \
-cpu host \
-smp 4 \
-m 8G \
\
-drive file=/opt/vm_storage/QEMU_EFI.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/opt/vm_storage/QEMU_VARS.fd,if=pflash,format=raw,unit=1 \
\
-device ramfb \
-device virtio-keyboard \
-device virtio-mouse \
\
-drive file=/opt/vm_storage/windows11_arm.img,if=virtio,format=qcow2 \
\
-nic user,model=virtio-net-pci \
\
-vnc :0
Preflight Script
#!/bin/bash
set -e
echo "Windows VM preflight check starting..."
# Method 1: Hard wait for PCIe/NVMe training on Grace CPU (worst case)
echo "Hard wait 18 seconds for Grace CPU PCIe/NVMe training..."
sleep 18
# Method 2: Ensure VM disk image is accessible
for i in {1..30}; do
if [ -r "/opt/vm_storage/windows11_arm.img" ] && [ -w "/opt/vm_storage/windows11_arm.img" ]; then
echo "VM image file is accessible."
break
else
echo "Image file not accessible yet, wait 2s... ($i/30)"
sleep 2
fi
done
# Method 3: Refresh EFI boot entries to avoid stale "Misc Device"
efibootmgr --refresh >/dev/null 2>&1 || true
echo "Preflight check passed, starting Windows VM in 3 seconds..."
sleep 3
systemd Service File
[Unit]
Description=Windows 11 ARM64 VM
After=network-online.target local-fs.target
Wants=network-online.target
[Service]
Type=simple
ExecStartPre=/opt/vm_storage/preflight.sh
ExecStart=/opt/vm_storage/start_vm.sh
# Critical parameters for DGX Spark
Restart=always
RestartSec=8
TimeoutStartSec=600
StartLimitIntervalSec=0
KillMode=process
LimitNOFILE=1048576
[Install]
WantedBy=multi-user.target
Are there any effective solutions to address this issue?