vGPU manager intall stack

I am installing an NVIDIA L40S on a RHEL 9.2 server and setting up a vGPU, but the installation is not progressing due to the following issue. I followed the steps at the URL below.・16.2. Managing NVIDIA vGPU devices

Step 4
Nothing is displayed with lsmod | grep nvidia_vgpu_vfio,
and an error <vgpu-mgr_err01.txt> appears with systemctl status nvidia-vgpu-mgr.service and the service cannot be started.

I tried reinstalling the driver, but the situation did not change.
<20250107_142654_10.141.68.33.log>

Please advise how to deal with this issue.

× nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Tue 2025-01-07 11:46:18 JST; 2min 58s ago
Duration: 294ms
Process: 3741 ExecStart=/usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Process: 3752 ExecStopPost=/bin/rm -rf /var/run/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Main PID: 3742 (code=exited, status=1/FAILURE)
CPU: 136ms

1月 07 11:46:18 nar-h1002psoe03 systemd[1]: Starting NVIDIA vGPU Manager Daemon…
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: Started NVIDIA vGPU Manager Daemon.
1月 07 11:46:18 nar-h1002psoe03 nvidia-vgpu-mgr[3742]: error: vmiop_env_log: Failed to initialize RM client: 0x26
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: nvidia-vgpu-mgr.service: Main process exited, code=exited, status=1/FAILURE
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: nvidia-vgpu-mgr.service: Failed with result ‘exit-code’.

Summary of implementation steps

  1. Enable IOMMU support in the host machine kernel

(1) Execute the following command.

*Since this is an Intel host, enable VT-d.

grubby --args=“intel_iommu=on iommu_pt” --update-kernel DEFAULT

(2) Execute the following command and reboot the host.

reboot

  1. Prevent the driver from binding to the GPU

(1) Execute the following command to identify the PCI bus address to which the GPU is connected.

lspci -Dnn | grep NVIDIA

0000:ca:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:26b9] (rev a1)

(2) Execute the following command to prevent the host’s graphics driver from using the GPU.

grubby --args=“pci-stub.ids=10de:26b9” --update-kernel DEFAULT

(3) Run the following command to reboot the host.

reboot

  1. Connecting the GPU to the virtual machine

(1) Create an XML configuration file for the GPU using the PCI bus address.

(2) Save the file on the host system.

Example: /home/nhkhart/GPU-Assign.xml

(3) Run the following command to merge the GPU XML file GPU-Assign.xml into the virtual machine’s XML configuration file.

virsh attach-device nar-h1002VSOE32 --file /home/nhkhart/GPU-Assign.xml --persistent

The device was successfully attached

/etc/libvirt/qemu/nar-h1002VSOE32.xml
*The following is displayed in the section.