I am installing an NVIDIA L40S on a RHEL 9.2 server and setting up a vGPU, but the installation is not progressing due to the following issue. I followed the steps at the URL below.・16.2. Managing NVIDIA vGPU devices
Step 4
Nothing is displayed with lsmod | grep nvidia_vgpu_vfio,
and an error <vgpu-mgr_err01.txt> appears with systemctl status nvidia-vgpu-mgr.service and the service cannot be started.
I tried reinstalling the driver, but the situation did not change.
<20250107_142654_10.141.68.33.log>
Please advise how to deal with this issue.
× nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Tue 2025-01-07 11:46:18 JST; 2min 58s ago
Duration: 294ms
Process: 3741 ExecStart=/usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Process: 3752 ExecStopPost=/bin/rm -rf /var/run/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Main PID: 3742 (code=exited, status=1/FAILURE)
CPU: 136ms1月 07 11:46:18 nar-h1002psoe03 systemd[1]: Starting NVIDIA vGPU Manager Daemon…
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: Started NVIDIA vGPU Manager Daemon.
1月 07 11:46:18 nar-h1002psoe03 nvidia-vgpu-mgr[3742]: error: vmiop_env_log: Failed to initialize RM client: 0x26
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: nvidia-vgpu-mgr.service: Main process exited, code=exited, status=1/FAILURE
1月 07 11:46:18 nar-h1002psoe03 systemd[1]: nvidia-vgpu-mgr.service: Failed with result ‘exit-code’.
Summary of implementation steps
- Enable IOMMU support in the host machine kernel
(1) Execute the following command.
*Since this is an Intel host, enable VT-d.
grubby --args=“intel_iommu=on iommu_pt” --update-kernel DEFAULT
(2) Execute the following command and reboot the host.
reboot
- Prevent the driver from binding to the GPU
(1) Execute the following command to identify the PCI bus address to which the GPU is connected.
lspci -Dnn | grep NVIDIA
0000:ca:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:26b9] (rev a1)
(2) Execute the following command to prevent the host’s graphics driver from using the GPU.
grubby --args=“pci-stub.ids=10de:26b9” --update-kernel DEFAULT
(3) Run the following command to reboot the host.
reboot
- Connecting the GPU to the virtual machine
(1) Create an XML configuration file for the GPU using the PCI bus address.
(2) Save the file on the host system.
Example: /home/nhkhart/GPU-Assign.xml
(3) Run the following command to merge the GPU XML file GPU-Assign.xml into the virtual machine’s XML configuration file.
virsh attach-device nar-h1002VSOE32 --file /home/nhkhart/GPU-Assign.xml --persistent
The device was successfully attached
/etc/libvirt/qemu/nar-h1002VSOE32.xml
*The following is displayed in the section.