When the NVIDIA Grid16.0 driver queries the number of regions for a vGPU device, the number returned does not include the console region

1 、Problem Description

When the NVIDIA Grid16.0 driver queries the number of regions for a vGPU device, the number returned does not include the console region.

2 、Problem impact

If the number of regions to be queried is used, the console region will be omitted when traversing all regions

3 、Detailed description

The following tests and code descriptions use the following versions

qemu Version: Community QEMU-8.1.2

Grid version: grid16.0

3 1 Obtain the number of vGPU regions

In Qemu, the ioctl of VFIO_DEVICE_GET_INFO is used to query vGPU information. The query code on gird16.0 is shown as follows

On line 3877, the number of regions is directly assigned to NV_VGPU_VFIO_REGIONS_MAX

The definition of NV_VGPU_VFIO_REGIONS_MAX is shown in the following figure.

In our context, line 268 is valid (line 266 is a similar situation).

According to the definition in qemu, NV_VGPU_VFIO_REGIONS_MAX is actually 9, indicating that the device has nine regions.

3 2 Querying the console region

When a vGPU configured windows VM starts VNC and switches to vGPU display, vfio_display_region_update is invoked in qemu to refresh VNC.

The specific code is as follows:

In line 404, run the ioctl of VFIO_DEVICE_QUERY_GFX_PLANE to query related information in the grid driver of host. The key is plane.region_index. This value is used in vfio_region_setup on line 440.

Look at the gird driver VFIO_DEVICE_QUERY_GFX_PLANE code

region_index is NV_VGPU_VFIO_CONSOLE_REGION, and NV_VGPU_VFIO_CONSOLE_REGION is NV_VGPU_VFIO_REGIONS_MAX

According to the analysis in section 3.1 above, NV_VGPU_VFIO_REGIONS_MAX is the maximum number of regions (9), so region_index is 9.

In qemu, region_index starts at 0, so the console region is actually the 10th region with an index of 9.

Therefore, the vGPU device should have 10 regions, not 9.

According to the definition of qemu, the first nine regions are shown in the following figure, excluding the console region.

When the region of plane.region_index (NV_VGPU_VFIO_CONSOLE_REGION) is queried in the following vfio_region_setup, the original num_regions range is exceeded.

In the gird driver, NV_VGPU_VFIO_CONSOLE_REGION is also the same as VFIO_PCI_BAR0_REGION_INDEX… VFIO_PCI_BAR5_REGION_INDEX Used as the index of the region.

Therefore, the num_regions found in section 3.1 also need to be added to the console region, which should be 10, not 9.