Hello, members. Could someone knows how to fix this issue?
-
Expected behavior
KVM on Ubuntu, ubuntu starts withvirsh startand the vGPU is recognized bynvidia-smi. -
Actual behavior
kernel outputs NULL pointer dereference, address: 0000000000000010
If I use libvirt configuration, managed='yes'
like the below:
nvidia-smi worked correctly if the configuration set to managed='no'
[ 216.812092] vfio_device_set_group+0x19/0x40 [vfio]
[ 216.812099] __vfio_register_dev+0x6c/0x140 [vfio]
[ 216.812105] ? __pm_runtime_idle+0x4c/0xd0
[ 216.812114] vfio_register_group_dev+0x10/0x20 [vfio]
[ 216.812119] vfio_pci_core_register_device+0x1b6/0x460 [vfio_pci_core]
[ 216.812130] vfio_pci_probe+0x53/0x140 [vfio_pci]
[ 216.812133] local_pci_probe+0x44/0xb0
[ 216.812143] work_for_cpu_fn+0x17/0x30
[ 216.812147] process_one_work+0x181/0x3a0
[ 216.812150] worker_thread+0x306/0x440
[ 216.812153] ? __pfx_worker_thread+0x10/0x10
[ 216.812156] kthread+0xef/0x120
[ 216.812163] ? __pfx_kthread+0x10/0x10
[ 216.812167] ret_from_fork+0x44/0x70
[ 216.812174] ? __pfx_kthread+0x10/0x10
[ 216.812177] ret_from_fork_asm+0x1b/0x30
-
reproduce steps
(1)/usr/lib/nvidia/sriov-manage -e 0000:ca:00.0
(2)echo 588 > /sys/bus/pci/devices/0000:ca:00.4/nvidia/current_vgpu_type
(3)virsh start vgpu1
(4)BUG: kernel NULL pointer dereference, address: 0000000000000010 -
environment:
NVIDIA-GRID-Ubuntu-KVM-580.95.02-580.95.05-581.42
Gust: nvidia-linux-grid-580_580.95.05_amd64.deb
Host: nvidia-vgpu-ubuntu-580_580.95.02_amd64.deb
GPU: A10
OS: Ubuntu 24.04
Kernel: 6.8.0-87-generic
lspci | grep NVIDIA
0000:ca:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:00.4 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:00.5 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:00.6 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:00.7 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.1 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.2 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.3 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.4 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.5 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.6 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:01.7 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.1 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.2 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.3 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.4 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.5 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.6 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:02.7 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.1 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.2 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.3 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.4 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.5 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.6 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:03.7 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:04.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:04.1 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:04.2 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0000:ca:04.3 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
Guest config xml file
<domain type='kvm' id='2'>
<name>vgpu1</name>
<uuid>c0e22088-c99f-4c20-90e8-e67ed37db700</uuid>
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static'>1</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os firmware='efi'>
<type arch='x86_64' machine='pc-q35-8.2'>hvm</type>
<firmware>
<feature enabled='no' name='enrolled-keys'/>
<feature enabled='no' name='secure-boot'/>
</firmware>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS_4M.fd'>/var/lib/libvirt/qemu/nvram/vgpu1_VARS.fd</nvram>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu mode='host-passthrough' check='none' migratable='on'/>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/path/to/vgpu1.img' index='1'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</disk>
<controller type='pci' index='0' model='pcie-root'>
<alias name='pcie.0'/>
</controller>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x8'/>
<alias name='pci.1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='2' port='0x9'/>
<alias name='pci.2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0xa'/>
<alias name='pci.3'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0x8'/>
<alias name='pci.4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0x9'/>
<alias name='pci.5'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='usb' index='0' model='qemu-xhci'>
<alias name='usb'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</controller>
<controller type='sata' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<interface type='bridge'>
<mac address='52:54:00:4e:9c:6c'/>
<source bridge='brbond0'/>
<target dev='vnet1'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/2'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/2'>
<source path='/dev/pts/2'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<audio id='1' type='none'/>
<video>
<model type='none'/>
<alias name='video0'/>
</video>
<!---
HERE HERE HERE HERE HERE
--->
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0xca' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</hostdev>
<watchdog model='itco' action='reset'>
<alias name='watchdog0'/>
</watchdog>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700</label>
<imagelabel>libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700</imagelabel>
</seclabel>
<seclabel type='dynamic' model='dac' relabel='yes'>
<label>+64055:+994</label>
<imagelabel>+64055:+994</imagelabel>
</seclabel>
</domain>
supported GPU types
cat /sys/bus/pci/devices/0000\:ca\:00.4/nvidia/creatable_vgpu_types
ID : vGPU Name
588 : NVIDIA A10-1B
589 : NVIDIA A10-2B
590 : NVIDIA A10-1Q
591 : NVIDIA A10-2Q
592 : NVIDIA A10-3Q
593 : NVIDIA A10-4Q
594 : NVIDIA A10-6Q
595 : NVIDIA A10-8Q
596 : NVIDIA A10-12Q
597 : NVIDIA A10-24Q
598 : NVIDIA A10-1A
599 : NVIDIA A10-2A
600 : NVIDIA A10-3A
601 : NVIDIA A10-4A
602 : NVIDIA A10-6A
603 : NVIDIA A10-8A
604 : NVIDIA A10-12A
605 : NVIDIA A10-24A
2172 : NVIDIA A10-3B
kernel message
[ 216.593438] kauditd_printk_skb: 116 callbacks suppressed
[ 216.593447] audit: type=1400 audit(1762346790.115:128): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700" pid=2629 comm="apparmor_parser"
[ 216.593793] audit: type=1400 audit(1762346790.115:129): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700//passt" pid=2629 comm="apparmor_parser"
[ 216.687415] audit: type=1400 audit(1762346790.209:130): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700" pid=2632 comm="apparmor_parser"
[ 216.693254] audit: type=1400 audit(1762346790.215:131): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700//passt" pid=2632 comm="apparmor_parser"
[ 216.785302] audit: type=1400 audit(1762346790.307:132): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700" pid=2636 comm="apparmor_parser"
[ 216.791246] audit: type=1400 audit(1762346790.313:133): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700//passt" pid=2636 comm="apparmor_parser"
[ 216.811390] nvidia 0000:ca:00.4: Enabling HDA controller
[ 216.811515] nvidia 0000:ca:00.4: Enabling HDA controller
[ 216.811525] nvidia 0000:ca:00.4: Runtime PM usage count underflow!
[ 216.811897] ------------[ cut here ]------------
[ 216.811902] WARNING: CPU: 16 PID: 777 at drivers/vfio/group.c:695 vfio_group_find_or_alloc+0xb9/0x1e0 [vfio]
[ 216.811918] Modules linked in: vfio_pci pci_pf_stub xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr bridge stp llc bonding nvidia_vgpu_vfio(OE) cfg80211 binfmt_misc nvidia(POE) intel_rapl_msr intel_rapl_common nls_iso8859_1 intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp vfio_pci_core mdev kvm_intel vfio_iommu_type1 vfio iommufd cmdlinepart spi_nor mtd kvm dax_hmem ast cxl_acpi ipmi_ssif rapl cxl_port irqbypass intel_cstate cxl_core i2c_algo_bit isst_if_mbox_pci isst_if_mmio intel_th_gth isst_if_common mei_me spi_intel_pci ioatdma intel_th_pci i2c_i801 spi_intel mei i2c_smbus intel_th intel_pch_thermal intel_vsec dca acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev input_leds mac_hid sch_fq_codel dm_multipath msr efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs
[ 216.811996] blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 mlx5_ib ib_uverbs macsec ib_core rndis_host cdc_ether usbnet mii hid_generic usbhid hid raid1 nvme nvme_core nvme_auth mlx5_core crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 mlxfw sha1_ssse3 psample tls ahci vmd pci_hyperv_intf xhci_pci libahci xhci_pci_renesas aesni_intel crypto_simd cryptd
[ 216.812041] CPU: 16 PID: 777 Comm: kworker/16:2 Tainted: P OE 6.8.0-87-generic #88-Ubuntu
[ 216.812044] Hardware name: Supermicro SYS-220U-TNR/X12DPU-6, BIOS 2.4 08/21/2025
[ 216.812046] Workqueue: events work_for_cpu_fn
[ 216.812057] RIP: 0010:vfio_group_find_or_alloc+0xb9/0x1e0 [vfio]
[ 216.812063] Code: 8b 80 03 00 00 48 8d 42 d8 48 39 d1 75 0f eb 60 48 8b 50 28 48 8d 42 d8 48 39 d1 74 53 4c 3b 28 75 ee 4c 89 f7 e8 77 89 a5 c7 <0f> 0b 48 c7 c3 ea ff ff ff eb 6a 44 0f b6 25 24 29 03 00 41 80 fc
[ 216.812066] RSP: 0018:ff6ee0c620027d10 EFLAGS: 00010246
[ 216.812069] RAX: 0000000000000000 RBX: ff4a4e2a91809800 RCX: ff4a4e2a91809b80
[ 216.812071] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 216.812072] RBP: ff6ee0c620027d30 R08: 0000000000000000 R09: 0000000000000000
[ 216.812073] R10: 0000000000000000 R11: 0000000000000000 R12: ff4a4e6a075c2900
[ 216.812075] R13: ff4a4e6a36fbe0c8 R14: ff4a4e2a91809b90 R15: ff4a4e6a36fbe000
[ 216.812076] FS: 0000000000000000(0000) GS:ff4a4ea8fee00000(0000) knlGS:0000000000000000
[ 216.812078] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 216.812080] CR2: 00007315cbb0e090 CR3: 0000005b8023c005 CR4: 0000000000771ef0
[ 216.812082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 216.812084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 216.812085] PKRU: 55555554
[ 216.812086] Call Trace:
[ 216.812089] <TASK>
[ 216.812092] vfio_device_set_group+0x19/0x40 [vfio]
[ 216.812099] __vfio_register_dev+0x6c/0x140 [vfio]
[ 216.812105] ? __pm_runtime_idle+0x4c/0xd0
[ 216.812114] vfio_register_group_dev+0x10/0x20 [vfio]
[ 216.812119] vfio_pci_core_register_device+0x1b6/0x460 [vfio_pci_core]
[ 216.812130] vfio_pci_probe+0x53/0x140 [vfio_pci]
[ 216.812133] local_pci_probe+0x44/0xb0
[ 216.812143] work_for_cpu_fn+0x17/0x30
[ 216.812147] process_one_work+0x181/0x3a0
[ 216.812150] worker_thread+0x306/0x440
[ 216.812153] ? __pfx_worker_thread+0x10/0x10
[ 216.812156] kthread+0xef/0x120
[ 216.812163] ? __pfx_kthread+0x10/0x10
[ 216.812167] ret_from_fork+0x44/0x70
[ 216.812174] ? __pfx_kthread+0x10/0x10
[ 216.812177] ret_from_fork_asm+0x1b/0x30
[ 216.812184] </TASK>
[ 216.812185] ---[ end trace 0000000000000000 ]---
[ 216.812217] vfio-pci: probe of 0000:ca:00.4 failed with error -22
[ 216.901145] audit: type=1400 audit(1762346790.423:134): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700" pid=2644 comm="apparmor_parser"
[ 216.901451] audit: type=1400 audit(1762346790.423:135): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700//passt" pid=2644 comm="apparmor_parser"
[ 216.919086] brbond0: port 2(vnet0) entered blocking state
[ 216.919100] brbond0: port 2(vnet0) entered disabled state
[ 216.919131] vnet0: entered allmulticast mode
[ 216.919510] vnet0: entered promiscuous mode
[ 216.919975] brbond0: port 2(vnet0) entered blocking state
[ 216.919987] brbond0: port 2(vnet0) entered forwarding state
[ 217.007756] audit: type=1400 audit(1762346790.529:136): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700" pid=2654 comm="apparmor_parser"
[ 217.013243] audit: type=1400 audit(1762346790.535:137): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-c0e22088-c99f-4c20-90e8-e67ed37db700//passt" pid=2654 comm="apparmor_parser"
[ 218.557158] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 218.557182] #PF: supervisor read access in kernel mode
[ 218.557190] #PF: error_code(0x0000) - not-present page
[ 218.557198] PGD 124a4d067 P4D 0
[ 218.557206] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 218.557214] CPU: 32 PID: 2658 Comm: qemu-system-x86 Tainted: P W OE 6.8.0-87-generic #88-Ubuntu
[ 218.557225] Hardware name: Supermicro SYS-220U-TNR/X12DPU-6, BIOS 2.4 08/21/2025
[ 218.557234] RIP: 0010:vfio_df_open+0x3e/0x120 [vfio]
[ 218.557248] Code: 83 ec 08 4c 8b 2f 41 8b 85 e4 03 00 00 85 c0 75 6f 41 c7 85 e4 03 00 00 01 00 00 00 4c 8b 37 4c 8b 67 28 49 8b 06 48 8b 40 68 <48> 8b 78 10 e8 f9 ec 9f c6 84 c0 0f 84 ab 00 00 00 4d 85 e4 0f 84
[ 218.557268] RSP: 0018:ff6ee0c6240d7b18 EFLAGS: 00010246
[ 218.557276] RAX: 0000000000000000 RBX: ff4a4e2a96ed9480 RCX: 0000000000000000
[ 218.557284] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff4a4e2a96ed9480
[ 218.557292] RBP: ff6ee0c6240d7b40 R08: 0000000000000000 R09: 0000000000000000
[ 218.557300] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 218.557308] R13: ff4a4e2a97d16000 R14: ff4a4e2a97d16000 R15: 0000000096ed9480
[ 218.557316] FS: 00007207c6d16f00(0000) GS:ff4a4e68ffa00000(0000) knlGS:0000000000000000
[ 218.557325] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 218.557331] CR2: 0000000000000010 CR3: 0000000135b2a005 CR4: 0000000000773ef0
[ 218.557339] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 218.557346] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 218.557354] PKRU: 55555554
[ 218.557359] Call Trace:
[ 218.557364] <TASK>
[ 218.557369] ? vfio_device_get_kvm_safe+0x57/0xc0 [vfio]
[ 218.557382] vfio_df_group_open+0x9c/0x200 [vfio]
[ 218.557392] vfio_group_ioctl_get_device_fd+0x10f/0x250 [vfio]
[ 218.557403] vfio_group_fops_unl_ioctl+0xff/0x3b0 [vfio]
[ 218.557413] __x64_sys_ioctl+0xa0/0xf0
[ 218.557424] x64_sys_call+0x12a3/0x25a0
[ 218.557432] do_syscall_64+0x7f/0x180
[ 218.557443] ? __symbol_put+0x69/0xa0
[ 218.557454] ? __kmalloc+0x1c0/0x4f0
[ 218.557466] ? task_numa_fault+0x23d/0x3f0
[ 218.557475] ? mpol_misplaced+0x69/0x200
[ 218.557483] ? do_numa_page+0x24d/0x3c0
[ 218.557492] ? handle_pte_fault+0x16e/0x1d0
[ 218.557500] ? __handle_mm_fault+0x654/0x800
[ 218.557508] ? __count_memcg_events+0x6b/0x120
[ 218.557516] ? count_memcg_events.constprop.0+0x2a/0x50
[ 218.557524] ? handle_mm_fault+0xad/0x380
[ 218.557531] ? arch_exit_to_user_mode_prepare.isra.0+0x1a/0xe0
[ 218.557540] ? irqentry_exit_to_user_mode+0x38/0x1e0
[ 218.557548] ? irqentry_exit+0x43/0x50
[ 218.557554] ? clear_bhb_loop+0x15/0x70
[ 218.557767] ? clear_bhb_loop+0x15/0x70
[ 218.557954] ? clear_bhb_loop+0x15/0x70
[ 218.558130] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 218.558304] RIP: 0033:0x7207c7324e1d
[ 218.558504] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 218.558870] RSP: 002b:00007ffd8d48c140 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 218.559056] RAX: ffffffffffffffda RBX: 00005f749a48cab0 RCX: 00007207c7324e1d
[ 218.559240] RDX: 00005f749a487960 RSI: 0000000000003b6a RDI: 000000000000000a
[ 218.559421] RBP: 00007ffd8d48c190 R08: 0000000000000000 R09: 0000000000000007
[ 218.559600] R10: 0000000180000000 R11: 0000000000000246 R12: 00000000000000d7
[ 218.559773] R13: 00007ffd8d48c1e8 R14: 0000000000000000 R15: 00005f749a485cc0
[ 218.559941] </TASK>
[ 218.560102] Modules linked in: vhost_net vhost vhost_iotlb tap vfio_pci pci_pf_stub xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr bridge stp llc bonding nvidia_vgpu_vfio(OE) cfg80211 binfmt_misc nvidia(POE) intel_rapl_msr intel_rapl_common nls_iso8859_1 intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp vfio_pci_core mdev kvm_intel vfio_iommu_type1 vfio iommufd cmdlinepart spi_nor mtd kvm dax_hmem ast cxl_acpi ipmi_ssif rapl cxl_port irqbypass intel_cstate cxl_core i2c_algo_bit isst_if_mbox_pci isst_if_mmio intel_th_gth isst_if_common mei_me spi_intel_pci ioatdma intel_th_pci i2c_i801 spi_intel mei i2c_smbus intel_th intel_pch_thermal intel_vsec dca acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev input_leds mac_hid sch_fq_codel dm_multipath msr efi_pstore nfnetlink dmi_sysfs ip_tables
[ 218.560151] x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 mlx5_ib ib_uverbs macsec ib_core rndis_host cdc_ether usbnet mii hid_generic usbhid hid raid1 nvme nvme_core nvme_auth mlx5_core crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 mlxfw sha1_ssse3 psample tls ahci vmd pci_hyperv_intf xhci_pci libahci xhci_pci_renesas aesni_intel crypto_simd cryptd
[ 218.562779] CR2: 0000000000000010
[ 218.563017] ---[ end trace 0000000000000000 ]---
[ 218.631245] RIP: 0010:vfio_df_open+0x3e/0x120 [vfio]
[ 218.631515] Code: 83 ec 08 4c 8b 2f 41 8b 85 e4 03 00 00 85 c0 75 6f 41 c7 85 e4 03 00 00 01 00 00 00 4c 8b 37 4c 8b 67 28 49 8b 06 48 8b 40 68 <48> 8b 78 10 e8 f9 ec 9f c6 84 c0 0f 84 ab 00 00 00 4d 85 e4 0f 84
[ 218.632046] RSP: 0018:ff6ee0c6240d7b18 EFLAGS: 00010246
[ 218.632311] RAX: 0000000000000000 RBX: ff4a4e2a96ed9480 RCX: 0000000000000000
[ 218.632578] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff4a4e2a96ed9480
[ 218.632846] RBP: ff6ee0c6240d7b40 R08: 0000000000000000 R09: 0000000000000000
[ 218.633115] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 218.633385] R13: ff4a4e2a97d16000 R14: ff4a4e2a97d16000 R15: 0000000096ed9480
[ 218.633656] FS: 00007207c6d16f00(0000) GS:ff4a4e68ffa00000(0000) knlGS:0000000000000000
[ 218.633932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 218.634207] CR2: 0000000000000010 CR3: 0000000135b2a005 CR4: 0000000000773ef0
[ 218.634487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 218.634766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 218.635044] PKRU: 55555554
[ 218.635320] note: qemu-system-x86[2658] exited with irqs disabled
nvidia-smi on a guest (managed='no')
Wed Nov 5 13:02:21 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10-1B On | 00000000:05:00.0 Off | N/A |
| N/A N/A P0 N/A / N/A | 0MiB / 1024MiB | 0% Prohibited |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+