Hi,
I have several issues setting up TDX-H100 environment.
Thank you for your guidance in advance.
TDX module not found
I setup BIOS and host OS for TDX, and I got the dimes like this:
sudo dmesg | grep -i tdx
[ 1.013576] virt/tdx: BIOS enabled: private KeyID range [4, 64)
[ 1.013580] virt/tdx: Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.
And this is not desirable output, it’s missing TDX module information.
The desired output:
To figure out this, should I update Firmware or set a specific option for TDX?
I followed all options in implementation guide, but is there something I need to check?
changing PPCIe mode
Another thing is, I can 't run the ppcie mode operation because of the error “A FW update is required.” In this case, where should I reference and what firmware exactly be updated?
sudo python3 ./nvidia_gpu_tools.py --set-ppcie-mode=off --reset-after-ppcie-mode-switch --gpu-bdf=19:00.0
NVIDIA GPU Tools version v2025.04.07o
Command line arguments: ['./nvidia_gpu_tools.py', '--set-ppcie-mode=off', '--reset-after-ppcie-mode-switch', '--gpu-bdf=19:00.0']
2025-07-01,13:30:05.139 WARNING GPU 0000:19:00.0 ? 0x2331 BAR0 0x0 was in D3/control:auto, forced power control to on. New state D0
Topo:
Intel root port 0000:14:06.0
GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000
2025-07-01,13:30:05.142 INFO Selected GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000
2025-07-01,13:30:05.143 WARNING GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000 has CC mode on, some functionality may not work
2025-07-01,13:30:05.153 INFO CC is currently active. It will be turned off before switching to PPCIe.
2025-07-01,13:30:05.198 ERROR GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000 does not support PPCIe on current FW. A FW update is required.
2025-07-01,13:30:05.199 WARNING GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000 restoring power control to auto
sudo python3 ./nvidia_gpu_tools.py --set-cc-mode=on --reset-after-cc-mode-switch --gpu-bdf=19:00.0
NVIDIA GPU Tools version v2025.04.07o
Command line arguments: ['./nvidia_gpu_tools.py', '--set-cc-mode=on', '--reset-after-cc-mode-switch', '--gpu-bdf=19:00.0']
2025-07-01,13:34:59.492 WARNING GPU 0000:19:00.0 ? 0x2331 BAR0 0x0 was in D3/control:auto, forced power control to on. New state D0
Topo:
Intel root port 0000:14:06.0
GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000
2025-07-01,13:34:59.494 INFO Selected GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000
2025-07-01,13:34:59.596 INFO GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000 CC mode set to on. It will be active after GPU reset.
2025-07-01,13:35:01.981 INFO GPU 0000:19:00.0 H100 0x2331 BAR0
0x22e042000000 was reset to apply the new CC mode.
2025-07-01,13:35:01.982 WARNING GPU 0000:19:00.0 H100 0x2331 **BAR0 0x22e042000000 has CC mode on, some functionality may not work**
2025-07-01,13:35:01.984 WARNING GPU 0000:19:00.0 H100 0x2331 BAR0 0x22e042000000 restoring power control to auto
And I got this “BAR0 0x22e042000000 has CC mode on, some functionality may not work”. Does PPCIe mode limitation affect the CC?
TDX cannot be executed
And the last thing, the TDX is not executable
lspci -nn | grep -i nvidia
19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 PCIe] [10de:2331] (rev a1)
sudo gpu-cc/h100/setup-gpus.sh
================================
List of NVidia GPUs (PCI BDFs):
0000:19:00.0
================================
sudo ./guest-tools/run_td --image=./guest-tools/image/tdx-guest-ubuntu-24.04-generic.qcow2 --gpus=0000:19.00.0
Clean VM
Run VM
Image: ./guest-tools/image/tdx-guest-ubuntu-24.04-generic.qcow2
Passthrough GPUs: ['0000:19.00.0']
sudo /shared/tdx/guest-tools/../gpu-cc/h100/setup-gpus.sh 0000:19.00.0
======= Prepare 0000:19.00.0
NVIDIA GPU Tools version v2024.08.09o
Command line arguments: ['./nvtrust/host_tools/python/nvidia_gpu_tools.py', '--set-ppcie-mode=off', '--reset-after-ppcie-mode-switch', '--gpu-bdf=0000:19.00.0']
2025-07-01,14:19:42.129 ERROR Matching for 0000:19.00.0 found nothing
NVIDIA GPU Tools version v2024.08.09o
Command line arguments: ['./nvtrust/host_tools/python/nvidia_gpu_tools.py', '--set-cc-mode=on', '--reset-after-cc-mode-switch', '--gpu-bdf=0000:19.00.0']
2025-07-01,14:19:42.207 ERROR Matching for 0000:19.00.0 found nothing
Device pci_0000_19_00_0 re-attached
Device pci_0000_19_00_0 detached
qemu-system-x86_64: -accel kvm: vm-type tdx not supported by KVM
It seems like the qemu doesn’t have tdx supports and gpu is not found.
/usr/bin/qemu-system-x86_64 --version
QEMU emulator version 8.2.2 (Debian 2:8.2.2+ds-0ubuntu1.4+tdx1.1)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
But it is the qemu that is installed when running setup_host_tdx.sh. And I could see the tdx in the version name.
What should I do with this issue?
And, are all issues related to each other? or is it a separate problem to fix?
Thank you
