Headless Vulkan with multiple GPUs

Hi. I maintain two open source projects (VirtualGL and TurboVNC) that are used to implement on-demand multi-user Linux remote desktop servers with GPU-accelerated OpenGL. I have been trying unsuccessfully to figure out how to support Vulkan applications in that environment. Despite the claims of headless support, nVidia’s Vulkan implementation still appears to require X11 in order to access multiple GPUs, and since the driver is closed-source, I cannot determine exactly why.

Specifically, what we observe is that, on a multi-GPU system running 470.xx, the first nVidia GPU shows up as a Vulkan physical device regardless of whether there is an X server attached to the GPU and regardless of the value of DISPLAY. However, the second and subsequent nVidia GPUs do not show up as Vulkan physical devices unless there is an X server attached and DISPLAY points to that X server. Can someone with knowledge of the Vulkan driver innards clue me in as to what is happening?

Don’t know if this is still the case:
https://forums.developer.nvidia.com/t/vulkan-multi-gpu/63854/6?u=generix

That appears to be the issue, yes. I understand that nVidia’s Vulkan implementation only allows the GPU attached to the current X server to be used with the X11 swap chain, but I don’t understand why the implementation won’t show other GPUs as physical devices unless DISPLAY is empty or points to the GPU-attached X server. Querying the physical devices occurs before a swap chain has been established.

I think “x11 swapchain” is a red herring, “interacting with the Xserver” is the real issue, meaning only non-X applications can use vulkan multi-gpu. Or put simply, the other way round, the nvidia xserver vulkan doesn’t support multi-gpu. For whatever reasons, I guess due to low demand and limited development resources.

Since that info is now 4 years old, it might be worth a shot to ping @dleone whether something has changed in that regard.

Forgive my lack of understanding of the Vulkan architecture. Is there a different nVidia Vulkan implementation for X11 vs. off-screen? If so, then when is the implementation selected? Does that occur in vkCreateInstance()?

I am ultimately trying to figure out how to decouple GPU rendering from the current X server (the X server specified in DISPLAY), as VirtualGL does for GLX and EGL/X11 applications. At the moment, though, I don’t feel as if I understand the Vulkan architecture well enough to say whether that is even possible. Another open source project has attempted it, but they could only make it work with nVidia’s Vulkan implementation by doing something really hackish (twiddling the value of the DISPLAY environment variable.)

I know less than nothing about all of this, I’m just transporting collected info I heard and saw, stashed in my memory. In general, the nvidia driver is very X-centric with sometimes weird behaviour, changing on whether an Xserver is running or not:
https://forums.developer.nvidia.com/t/nvidia-smi-no-device-where-found/197768/19?u=generix
vkcube running on an nvidia egpu displaying on an Xserver only running on the amd gpu.

Just a thought: taking the info from primus_vk into account, maybe “application interacting with X” rather translates to “keeping the driver from interacting with X” to modify its behaviour and setting __NV_PRIME_RENDER_OFFLOAD=1 somehow prevents that?

You’re right that the NVIDIA Vulkan driver currently has two modes: one where it talks to an X server and enumerates the GPUs that are available in that X server, and one where it enumerates devices directly and doesn’t talk to an X server. It will choose one mode or the other based on whether the DISPLAY environment variable is set and whether it can connect to the X server it specifies.

On recent versions of the driver, and on modern X servers, the server will automatically create so-called “GPU screens” for any GPU it finds that doesn’t have a real X screen on it. That will, in turn, make those GPUs available to Vulkan applications that are connected to the X server. You can get a list of which GPUs are bound to the X server by running xrandr --listproviders. If you don’t want the Vulkan implementation to do that X server-based enumeration, you can unset DISPLAY in your application.

Thank you @aplattner
Bonus question, mildly related:
so far, when specifying the provider to use as documented on an nvidia dual gpu system, either the Xserver would nosedive or rendering just happend on the primary provider. Did that happen to be fixed?

@aplattner But why would you assume that no application will ever need to use both X11 and off-screen Vulkan at the same time? In fact, that is exactly what we would need to do in order to support Vulkan applications with GPU acceleration in open source remote desktop environments such as VNC, NX, etc. I recognize that the driver behaves in the way you describe. Now I’m asking: why does it have to behave that way, are there any plans to improve its behavior, and how can I work around it in the meantime? I want to do for Vulkan what VirtualGL has already done for GLX and EGL/X11: redirect rendering into an off-screen buffer on the GPU, then read it back and display it to the X server specified in DISPLAY. VirtualGL has about 50,000 downloads/week. Over the 18 years that we’ve been a project, we have sold a lot of GPUs for nVidia, so I don’t think it’s too much to ask that our use case be given proper consideration.

For Vulkan, it’s mostly up to the application to choose which GPU to render on and which GPU to present on. You automatically get render offload behavior if you create a rendering queue on one GPU and a swapchain on a different GPU. The various render offload environment variables are mostly for OpenGL since it doesn’t have the same kind of device enumeration support that Vulkan does.

Can you elaborate on what you mean by “nosedive”?

It’s a technical limitation of the way the driver initializes currently. Work is being done to improve it but it’s a significant refactoring so it might take a while.

For your virtual use case, is the X server remote? If it’s a local X server, your best bet would be to make sure the GPU you want to render on is provided as a GPU screen in the X server so the driver can do its own accelerated cross-device presentation. If you really need to render completely offscreen, your best bet for now is to temporarily unset DISPLAY before initializing Vulkan, and then either set DISPLAY again or just pass the server string you want to connect to directly to XOpenDisplay rather than relying on the environment variable.

@aplattner The X server specified by DISPLAY is on the same machine as the application, but it is an X proxy (such as Xvnc), so it isn’t connected to a GPU. The current implementation of VirtualGL supports offloading OpenGL rendering to a server-side GPU either via GLX (which requires going through a “3D X server” attached to the GPU) or device-based EGL (which doesn’t require a 3D X server.) In the latter mode, VirtualGL translates GLX commands into EGL commands (somewhat non-straightforwardly, since EGL doesn’t support multi-buffered Pbuffers but GLX does.) It is OK if supporting Vulkan in this environment requires a 3D X server. We can work with that limitation. The main thing is that we need to be able to offload rendering to a GPU but deliver the final pixels into the X proxy that has no GPU acceleration. (IOW, the X proxy only has Mesa at its disposal.) Note that VirtualGL is fundamentally an interposer, so it rewrites GLX and EGL/X11 function calls from applications and could do the same with Vulkan function calls if necessary. For instance, it would be no problem to interpose vkCreateInstance() and temporarily repoint DISPLAY to :0.0 or unset it within the body of that function, but I suspect that we’d need to somehow rewrite the swapchain as well (but I could be wrong about that.)

Irrespective of the multi-GPU limitation that prompted this post, what I observe at the moment is that running vkcube in an X proxy session acts like it is GPU-accelerated (i.e. it reports that it is using my Quadro, and it has a high frame rate), but I’m not sure whether it is fully accelerated. I need to understand what happens in the swapchain if a GPU not connected to the current X server is selected for rendering and the current X server only supports software rendering.

On recent drivers, if the Vulkan ICD manages to connect to the X server but doesn’t find our driver running on the server side, it falls back to a mode where it renders on the local GPU and then the swapchain uploads pixels using a fallback, likely XPutImage in this case. In this mode, the graphics and compute queues are fully accelerated just as they would be on a normal configuration, but the swapchain reads the pixels back from the GPU to system memory and then uses the PutImage fallback path to present.

I’m not sure which GPUs get enumerated by the Vulkan ICD in that case. I don’t have a multi-GPU system to test it with at the moment.

@aplattner Thanks for the info. That swapchain behavior is exactly what we need. The principal complaint of this thread, however, is that only the first GPU is exposed in that case. If, after connecting to the X server and not finding the nVidia driver running on it, the Vulkan ICD exposed all nVidia GPUs, then that would satisfy our needs. Is there a good reason why it can’t do that?

I’m not sure, I’ll have to look into it.

nosedive, crash:
https://forums.developer.nvidia.com/t/ubuntu-20-04-not-able-to-run-6-7-monitor-setup-with-one-x-screen/200904/7?u=generix

I was able to move some GPUs around in my lab and set up a dual nVidia configuration with a Quadro K5000 and P620. I observe the same issue on this newly configured machine. (NOTE: the issue was originally observed on a customer’s machine, so this was my attempt to reproduce it locally.) I built the Vulkan SDK and attempted to modify the vulkaninfo source code so that it wraps certain Vulkan functions, either unsetting DISPLAY or setting it to :0.0 and then restoring its value. What I observe is that DISPLAY must be modified prior to the first call to vkEnumerateInstanceExtensionProperties() in order for all GPUs to be visible, but setting DISPLAY back to its original value after that function call causes subsequent function calls to fail. Ugh.

Nothing more I can do without advice from someone who can actually see the driver source code, but from my point of view, whatever the driver is doing vis-a-vis DISPLAY seems really ugly.