364.19 Linux/X11 - Presenting from more than 2 queues causes hangs/VK_ERROR_DEVICE_LOST.

Using nVidia drivers 364.19 for Linux, with a GTX 780, the implementation gives me access to one queue family supporting graphics work, with 16 queues. Doing nothing but submitting a simple image clearing command buffer, and presenting on that same queue (then waiting for queue idle), problems start to occur when more than two queues within that single queue family are used.

I ran into this while experimenting with multiple output windows (on a single monitor), with a single swapchain for each surface, and a dedicated queue per swapchain, but the problem also occurs using only one surface and one swapchain.

As soon as commands/present-calls are sent to more than two queues, performance grinds to a halt after precisely six frames, regardless of number of queues used or the size of the swapchain. Depending on whether I use a swapchain of two images or more, I either get an indefinite hang while waiting on the single submit’s fence, or VK_ERROR_DEVICE_LOST from the submit or present calls.

Below I’ve linked the smallest fail case I could come up with, with more details;

Am I doing something wrong? I’m still shaky on a lot of the details of Vulkan, and I’ve been looking at this problem for way too long to actually spot anything anymore, so I can’t rule out some dumb mistake on my part.

As far as I’ve been able to find, the Vulkan specification doesn’t specifically state whether or not presentation has to be limited to a single queue (just that the queue used has to be from a family compatible with the presentation surface). But whether this signifies an issue with the specification, or if it counts as a bug in the nVidia implementation, is above my paygrade.

I’ve done some more experimentation around this issue and discovered that the fail case can be shifted from three queues up to four queues by replacing the call to vkQueueWaitIdle() with a zero command buffers fence-only vkQueueSubmit() call, and an immediately following vkWaitForFences() busy loop.

Without the use of vkQueueWaitIdle(), the program runs fine for any number of swapchain images presented across three queues, and only fails when a fourth queue is introduced.

I also experimented with using different sets of consecutive queue indices, and random selections of non-consecutive queues within the queue family, but that hasn’t had any bearing on the issue.

Updated version that replaces vkQueueWaitIdle() with vkQueueSubmit() and vkWaitForFences();

Edit update: Intentionally sleeping between frames (for anywhere from 1ms to 1000ms) has no influence on the issue either, so it doesn’t seem to be sensitive to timing.

It’s been three weeks without a response or a fix, and the issue persists. I’ve since tested this in Windows 10 on driver version 365.10, and the problem never manifested. Any number of queues presented to works just fine there.

While I was at it I ran into some odd disparities between the capabilities reported by Vulkan on both platforms, on the same machine with a GTX 780 graphics card;

On Windows:

- Queue families supported: 1
    Queue family 0
      - Supports: Graphics Compute Transfer SparseBinding
      - Queues: 16
- Device has 3 available presentation mode(s):
    VK_PRESENT_MODE_FIFO_KHR
    VK_PRESENT_MODE_FIFO_RELAXED_KHR
    VK_PRESENT_MODE_MAILBOX_KHR
- Surface Capabilities:
    - Minimum swapchain image count: 1

On Linux under X11:

- Queue families supported: 2
    Queue family 0
      - Supports: Graphics Compute Transfer SparseBinding
      - Queues: 16
    Queue family 1
      - Supports: Transfer 
      - Queues: 1
- Device has 3 available presentation mode(s):
    VK_PRESENT_MODE_FIFO_KHR
    VK_PRESENT_MODE_FIFO_RELAXED_KHR
    VK_PRESENT_MODE_IMMEDIATE_KHR
- Surface Capabilities:
    - Minimum swapchain image count: 2

The change from MAILBOX to IMMEDATE mode support I would assume could be chalked up to platform differences. The minimum swapchain size I’m less sure about, and the difference in queue families I find downright weird.

nVidia driver 367.27, Linux kernel 4.6.2; Instead of fixing the problem, this new driver makes it impossible to use more than a single queue before throwing a VK_ERROR_DEVICE_LOST.

On top of that, vkAllocateMemory() now segmentation faults where it didn’t before, and there seems to be an undocumented new extension called ‘VK_NV_dedicated_allocation’.

Edit update: vkAllocateMemory()'s crash was a case of PEBKAC. I was relying on broken behavior that apparently has been fixed.

From where did you get 367.27 for Linux? The nVidia download page is still at 367.18.

Thanks!

Edit: I was only looking at the Vulkan driver page, not realizing that the official Linux driver page seems to have a more recent version of the driver. Am I right that 367.27 is a replacement for 367.18?

I rely on Gentoo’s package maintainers to keep my driver up to date, so never had to download the drivers directly myself. As such I also have no idea if they maintain different branches or not, but I trust the number increment to mean ‘improved’.

As an aside, I never got a notification that there was a reply to this thread. Between never hearing a peep back about my issue, and the apparently broken notifications, these forums are feeling kind of neglected by nVidia. :-\

HypnoGenX, I’m seeing the same issue in my code. I’m not using multiple queues because the Quadro K600 I’m using supports graphics and present from queue family 0, but I am running ubuntu 14.04.1 and 3.13.0-92-generic, and nvidia driver 367.35.

I’m happy to boot into gentoo (I’m a gentoo fan) and chase this issue down.

Can you contact me via email? I pmed you my email address. Or, would you open an issue at GitHub - KhronosGroup/Vulkan-LoaderAndValidationLayers: **Deprecated repository** for Vulkan loader and validation layers - I follow that pretty closely.

tl;dr maybe the same issue, I’m only submitting on 1 queue, getting VK_ERROR_DEVICE_LOST.

Ok, it sounds like this is not the same issue.

Linux kernel 4.7.1, new nVidia drivers 370.23.

The bug seems to be fixed. I can now present any number of swapchain images on any number of queues without issue.

And it looks like the release notes confirm it;

[url]https://devtalk.nvidia.com/default/topic/957782/unix-graphics-announcements-and-news/linux-solaris-and-freebsd-driver-370-23-beta-/[/url]