Rmk177
November 27, 2024, 7:08am
1
Good day!
I tried to migrate my VM(Debian12) on Nvidia A10 on PowerEdge R740xd.
Install drivers 550.90.05 vgpu-kvm on host kernel 6.6.28 and 550.90.07 grid in VM.
In my libvirt VM XML description I use VFIO because kernel 6.6, not mdev like in docs.
So I have acceleration in VM and nvidia-smi told me that OK.
I use virtfn22 and 598 profile.
Now I start migrate
u@a246:~$ virsh -c qemu+ssh://u@a246/system migrate d12-nfs --live --verbose --parallel --unsafe qemu+ssh://u@a247/system
Migration: [100,00 %]Error: internal error: QEMU unexpectedly closed the monitor (vm=‘d12-nfs’): Socket connected unix:/var/lib/libvirt/qemu/1-d12-nfs/monitor.sock,server=on
2024-11-27T06:54:38.654160Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:af:03.2”,“id”:“hostdev0”,“bus”:“pci.7”,“addr”:“0x0”}: warning: vfio 0000:af:03.2: Could not enable error recovery for the device
2024-11-27T06:54:39.636449Z qemu-system-x86_64: error while loading state section id 78(0000:00:02.6:00.0/vfio)
2024-11-27T06:54:39.637563Z qemu-system-x86_64: load of migration failed: Input/output error
But on src side I have
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 30303030-613a-3a66-3033-2e3200000000 GPU PCI id 00:af:03.2 config params vgpu_type_id=598
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=598
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: Successfully updated env symbols!
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): detected a VF at 0:af:3.2
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): gpu-pci-id : 0xaf00
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vgpu_type : NVS
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): Framebuffer: 0x38000000
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): Virtual Device Id: 0x2236:0x14c0
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: ######## vGPU Manager Information: ########
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: Driver Version: 550.90.05
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU BAR1 size 256 MB
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0x140001)
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): Init frame copy engine: syncing…
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU migration enabled
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU manager is running in SRIOV mode.
nov 27 09:52:06 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: display_init inst: 0 successful
nov 27 09:52:18 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ########
nov 27 09:52:18 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: Driver Version: 550.90.07
nov 27 09:52:18 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: vGPU version: 0x140001
nov 27 09:52:18 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Unrestricted)
nov 27 09:52:23 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): vGPU license state: Licensed
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: (0x0): Plugin migration stage change none → pre_copy. QEMU migration state: PRECOPY_ACTIVE
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): Start pre-copy vGPU state …
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_env_log: (0x0): Plugin migration stage change pre_copy → none. QEMU migration state: RUNNING
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: error: vmiop_log: (0x0): Error saving page in pipelined mode
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: error: vmiop_log: (0x0): Failed to copy dirty fb pages.
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: error: vmiop_log: (0x0): Failed to save FB pages marked dirty. 0x1
nov 27 09:54:37 a246 nvidia-vgpu-mgr[19722]: notice: vmiop_log: (0x0): Migration Ended
On dst side
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 30303030-613a-3a66-3033-2e3200000000 GPU PCI id 00:af:03.2 config params vgpu_type_id=598
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=598
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_env_log: Successfully updated env symbols!
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): detected a VF at 0:af:3.2
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): gpu-pci-id : 0xaf00
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): vgpu_type : NVS
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Framebuffer: 0x38000000
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Virtual Device Id: 0x2236:0x14c0
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: ######## vGPU Manager Information: #######
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: Driver Version: 550.90.05
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): vGPU BAR1 size 256 MB
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Detected ECC enabled on physical GPU.
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Guest usable FB size is reduced due to ECC.
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): This vGPU type does not support ECC.
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0x140001)
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Init frame copy engine: syncing…
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): vGPU migration enabled
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): vGPU manager is running in SRIOV mode.
nov 27 09:54:38 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: display_init inst: 0 successful
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_env_log: (0x0): Plugin migration stage change none → resume. QEMU migration state: RESUME
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Start restoring vGPU state …
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: Assertion Failed at 0x85b0fca2:2427
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: 8 frames returned by backtrace
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv009836vgpu+0x35) [0x7e1485ad3195]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv012511vgpu+0x180) [0x7e1485ae2ad0]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv012521vgpu+0x3e2) [0x7e1485b0fca2]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv004801vgpu+0x11f) [0x7e1485ae497f]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv012449vgpu+0x4b) [0x7e1485a70fab]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: vgpu(+0x184c1) [0x607a0b6184c1]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7e1486281134]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7e14863017dc]
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: notice: vmiop_log: (0x0): Migration Ended
nov 27 09:54:39 a247 nvidia-vgpu-mgr[31454]: error: vmiop_env_log: (0x0): Failed to write device buffer err: 0x1f
Is there any addinitonal iformation I can provide?
The main question is what is pipelined mode?
Thanx in advance
Bye.