VFIO Passthrough for HGX B200 system

We’re configuring the virtualization software for the customer for HGX B200 8-GPU. Our stack utilizes VFIO / KVM / Qemu for direct GPU passthrough. We run into the following issue with this system:

  1. The VM is created, and the GPU is successfully passed through and visible with lspci.

  2. nvidia-smi: Shows Driver Version: 570.172.08 for the B200. Persistence mode is “On”.

  3. deviceQuery: Reports CUDA Error: initialization error (code 3).

  4. We consistently see the following error in the dmesg

    NVRM: kbifCacheVFInfo_GB100: Unable to read NV_PF0_INITIAL_AND_TOTAL_VFS
    NVRM: calculatePCIELinkRateMBps: Unknown PCIe speed
    NVRM: getPCIELinkRateMBps: Generic Error: Invalid state [NV_ERR_INVALID_STATE]
    [drm] [nvidia-drm] [GPU ID 0x00000010] Failed to allocate NvKmsKapiDevice

Is this a known issue? What would be the fastest way to resolve the problem, i.e., minimizing the amount of changes necessary in our software?