Headless Vulkan with multiple GPUs

You’re right that the NVIDIA Vulkan driver currently has two modes: one where it talks to an X server and enumerates the GPUs that are available in that X server, and one where it enumerates devices directly and doesn’t talk to an X server. It will choose one mode or the other based on whether the DISPLAY environment variable is set and whether it can connect to the X server it specifies.

On recent versions of the driver, and on modern X servers, the server will automatically create so-called “GPU screens” for any GPU it finds that doesn’t have a real X screen on it. That will, in turn, make those GPUs available to Vulkan applications that are connected to the X server. You can get a list of which GPUs are bound to the X server by running xrandr --listproviders. If you don’t want the Vulkan implementation to do that X server-based enumeration, you can unset DISPLAY in your application.