AMD SEV-SNP + H100 hangs or fails during CVM launch

Trying to launch a CVM using launch_vm.sh script provided in the host_tools sample kvm scripts of nvtrust. The first error observed is error: kvm run failed Invalid argument.
To resolve this issue, followed the solution suggested here to provide additional parameters like -kernel, -initrd and -append options while launching the CVM.

While using /data/shared/nvtrust/host_tools/sample_kvm_scripts/images/ubuntu22.04.qcow2 as the guest image, I tried supplying the following additional parameters while launching the VM:

-kernel /boot/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e \
-initrd /boot/initrd.img-5.19.0-rc6-snp-host-c4daeffce56e \
-append BOOT_IMAGE=/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e root=/dev/mapper/ubuntu--vg-ubuntu--lv ro net.ifnames=0 biosdevname=0

I still get the following error:

$:/data/shared/nvtrust/host_tools/sample_kvm_scripts$ sudo ./launch_vm.sh -e 
./launch_vm.sh: 28: echo: echo: I/O error
qemu-system-x86_64: root=/dev/mapper/ubuntu--vg-ubuntu--lv: Could not open 'root=/dev/mapper/ubuntu--vg-ubuntu--lv': No such file or directory
$:/data/shared/nvtrust/host_tools/sample_kvm_scripts$ ls -al /dev/mapper/ubuntu--vg-ubuntu--lv
lrwxrwxrwx 1 root root 7 Jan 15 21:57 /dev/mapper/ubuntu--vg-ubuntu--lv -> ../dm-0
$:/data/shared/nvtrust/host_tools/sample_kvm_scripts$ ls -al ../dm-0
ls: cannot access '../dm-0': No such file or directory
$:/data/shared/nvtrust/host_tools/sample_kvm_scripts$

Any suggestions on how to start the CVM correctly would be helpful.

Can you provide the type information of the CPU? The boot scripts might have some differences between 700x series and 900x series of AMD CPU.

It’s AMD EPYC 9124 16-Core Processor

Can you please provide a “uname -a” from the host side?

Following this, did you do the modification of the ./launch_vm.sh to point to /data/shared/AMDSEV/… ?

@rnertney launch_vm.sh is pointing to /data/shared/AMDSEV/… This is what my modified script looks like:

AMD_SEV_DIR=/data/shared/AMDSEV/snp-release-2024-01-14
VDD_IMAGE=/data/shared/nvtrust/host_tools/sample_kvm_scripts/images/ubuntu22.04.qcow2

#Hardware Settings
NVIDIA_GPU=21:00.0
MEM=64 #in GBs
FWDPORT=9899

doecho=false
docc=true

while getopts "exp:" flag
do
        case ${flag} in
                e) doecho=true;;
                x) docc=false;;
                p) FWDPORT=${OPTARG};;
        esac
done

NVIDIA_GPU=$(lspci -d 10de: | awk '/NVIDIA/{print $1}')
NVIDIA_PASSTHROUGH=$(lspci -n -s $NVIDIA_GPU | awk -F: '{print $4}' | awk '{print $1}')

if [ "$doecho" = true ]; then
         echo 10de $NVIDIA_PASSTHROUGH > /sys/bus/pci/drivers/vfio-pci/new_id
fi

if [ "$docc" = true ]; then
        USE_HCC=true
fi

$AMD_SEV_DIR/usr/local/bin/qemu-system-x86_64 \
${USE_HCC:+ -machine confidential-guest-support=sev0,vmport=off} \
${USE_HCC:+ -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1} \
-enable-kvm -nographic -no-reboot \
-cpu EPYC-v4 -machine q35 -smp 12,maxcpus=31 -m ${MEM}G,slots=2,maxmem=512G \
-drive if=pflash,format=raw,unit=0,file=$AMD_SEV_DIR/usr/local/share/qemu/OVMF_CODE.fd,readonly=on \
-drive file=$VDD_IMAGE,if=none,id=disk0,format=qcow2 \
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
-device scsi-hd,drive=disk0 \
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
-netdev user,id=vmnic,hostfwd=tcp::$FWDPORT-:22 \
-device pcie-root-port,id=pci.1,bus=pcie.0 \
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144 \
-kernel /boot/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e \
-initrd /boot/initrd.img-5.19.0-rc6-snp-host-c4daeffce56e \
-append BOOT_IMAGE=/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e root=/dev/mapper/ubuntu--vg-ubuntu--lv ro net.ifnames=0 biosdevname=0

Details of host OS:

$ uname -a
Linux TRY-27360-gpu01 5.19.0-rc6-snp-host-c4daeffce56e #1 SMP Sun Jan 14 10:36:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

If run nvidia-smi on the host machine, can you get the GPU that you want to attach to VM? (Should be no)

@Yifan-Tan yes that’s right

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

But even without the I/O error from the driver, I can’t launch the CVM.

Execute dmesg on the host. Is there any error?

Thanks for your help!

Offline this was diagnosed as being an issue with the Ubuntu 22.04 HWE kernel upgrading to 6.5. There is some sort of breakage between the guest/host (outside of NVIDIA code) which results in a failure to boot the CVM.

Utilizing the AMDESE’s provided guest-OS rather than stock Ubuntu 22.04 + HWE kernel will solve the issue. We are working with Canonical to try to root cause the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.