(AMD EPYC 9224+H100) Stuck in booting

Motherboard: gigabyte G493-ZB2 (G493-ZB2 (rev. AAP1) | GPU Servers - GIGABYTE Korea)
OS: Ubuntu 22.04 server
CPU: AMD EPYC 9224
RAM: 64G / SSD: 2T

Hi there,
I was trying to do confidential computing with H100.

But, based on the Confidential Computing Deployment Guide(https://docs.nvidia.com/confidential-computing-deployment-guide.pdf), I succssed until installing the Host OS (i.e., 5.19 snp-aware kernel).

But, I was stuck on the booting after installing the 5.19 kernel.

So, I was trting to figure out what’s problem in, and then found out that when I changed SMEE mode in BIOS back to ‘Auto’ (default value), it works.

My gigabyte BIOS has following options about SEV-SNP,

SMEE (default Auto)
SEV-ES ASID Space Limit (default 1)
SNP Memory Coveragy (default Auto)

IOMMU (default Enabled)
SEV-SNP support (default Auto)

How to fix it?
Thanks in advance.

stuck on screenshot

The log shows SEV-SNP: failed to INIT error 0x3.

Could you boot the VM without H100 using the BIOS configuration listed in the guide (turn to “on”)?

Hi @Yifan-Tan ,

This is the host machine, not VM.

In that case, you also say that removing H100 from the machine physically?

Oh, my mistake.

It is probably the issue of AMD SEV. Issues · AMDESE/AMDSEV · GitHub has plenty of similar issues for error 0x3.

I guess it is not the issue of H100. You could exclude H100’s problem by not installing (or uninstalling) nvidia driver on the host.

Per the Deployment Guide, we want to have these options configured specifically, not on “Auto” (as I see you have set):

  • SEV ASID Count → 509 ASIDs
  • SEV-ES ASID space Limit Control → Manual
  • SEV-ES ASID space limit → 100
  • SNP Memory Coverage → Enabled
  • SMEE → Enabled
  • SEV-SNP Support → Enabled
  • IOMMU → Enabled