Trouble with VK_NVX_device_generated_commands

I am unable to get this new extension to work at all on my system. After some frustrating hours in my own code of simply getting no results, i tried the GameWorks sample BasicDeviceGeneratedCommandsVk from github, and surprisingly got the same result: No model shows up if i select any of the DGC render variants, it behaves like being a no-op. The Non-DGC modes work as expected.

After some more testing in my code even the simplest case with immediate mode (no secondary buffer) where i basically switch a vkCmdDrawIndirect with a vkCmdProcessCommandsNVX behaves simply like a no-op.

My system:
Windows 7 SP1 x64
GeForce GTX 970
Tested with both Developer driver 377.14 and WHQL driver 381.89.
Vulkan SDK 1.0.46.0 (also tested directly interfacing with nvoglv64.dll, same result)
MSVC 2017 x64

For any information regarding this or if someone could try to replicate this i would be grateful.

Regards

Just tried with with WHQL driver 382.05, same faulty behavior.

Thanks for the report. We have filed an internal bug to track this issue.

Tried with WHQL driver 382.33, again same faulty behavior.

If you require any further information or would like me to run some diagnostics on my PC, just write it here or via PM.

Heads-up from Khronos:

Vulkan 1.0.51 spec update

  • Fix enum naming and clarify behavior for
    VK_NVX_device_generated_commands extension.

Thanks for the heads up. Does not solve the problem, but a spec update is always welcome. Since the “official” demo from NVIDIA itself is not working, i expect the issue to be a driver bug.

Just tested with the new 382.53 WHQL driver, the problem is still not fixed.

Try the new Vulkan beta 382.58, also released today.
Fixes also design works sample.

Alright, with the new beta driver 382.58, the BasicDeviceGeneratedCommandsVk sample is working as intended. Thanks for the fix!

Getting my stuff to work is 50/50. Using just Draw, DrawIndexed and BindPipeline, it works fine. But i cannot get the 4 other (DescriptorSet, PushConstant, IndexBuffer, VertexBuffer) to work. Since the 2 buffer ones work in the sample, this looks like my fault. DescriptorSet works on the first few submits, and i can see the correct output in my window, but as soon as i recycle the cmd buffer, the next submit after i submit the recycled cmd buffer fails with device lost. IndexBuffer/VertexBuffer do not come this far, i get a device lost before i get a correct output.

There seems to be quite some WIP in the spec and the sample:

  1. For PushConstant, it is unclear if the dynamicCount parameter in the layout represents the size in bytes or sizeof(uint32_t). It is used as both in the pseudo-code.
  2. The sample fails the validation layers quite heavily: The secondary cmd buffers do not contain any info on the renderpass and do not set the viewport or scissor. Instead the main cmd buffer does this, which is not allowed if it calls vkCmdExecuteCommands.
  3. vkCmdProcessCommandsNVX is specified to only execute inside a render scope, but this makes only sense for graphics and only in immediate mode without a secondary cmd buffer. If recording to a secondary cmd buffer, that buffer “should” contains the render-pass info. As in 2, calling vkCmdProcessCommandsNVX while inside a renderpass that executes secondary cmd buffers is prohibited. This would mean that one would need to create a pseudo render-pass to process commands into a secondary cmd buffer, which makes no sense.
  4. I expect that the limits/feature query is WIP and will use VK_KHR_get_physical_device_properties2 in the future.

I have uploaded 2 API dumps of my program, at least the DescriptorSet seems interesting since it creates a correct output before the device is lost.

DescriptSet test: https://pastebin.com/N2TJPzjk
IndexVertexBuffer test: https://pastebin.com/WwRGnJ3B

If you could take a look at these, esp. the DescriptSet one, i would appreciate it.

Regards

PS: I have not tested this yet, but i will just throw in that i really hope that DGC is fully compatible with sparse residency since it might be an obscure case.

After some more tests there seem to be at least 2 more bugs. I have isolated one so far:

  • For index buffers, VK_INDEX_TYPE_UINT16 is not supported and leads to a device lost error.

This one seems to be pretty clearly in the driver, my code works if i use a UINT32 index buffer.

The second one will be harder to track down, there seems to be some strange interaction between my resource loading and DGC, even if these resources are not touched after loading. Since the resource is a sparse image i suspect it might have to do something with that. If i deactivate that part of my code DGC works like a charm, although i have not tested PushConstant due to the spec bug.

I will post again when i have tracked down the second bug.

Regards

So, i isolated the second bug. Nothing to do with the assets itself, but i used the image count in VkObjectTableCreateInfoNVX. Turns out that if you set maxStorageImagesPerDescriptor or maxSampledImagesPerDescriptor to anything other than 0 and then use a descriptor set in DGC, even if the descriptor set does not actually use any images, you get a device lost error after submit. Registering descriptor sets to the table is working, and as long as the descriptor set is not actually used by DGC the 2 counts can be larger than 0.

I have also tested some sparse interaction. So far vertex and index buffers bound in the table can be sparse and work as expected. Further tests would be if the buffers supplied to vkCmdProcessCommandsNVX can be sparse and if buffers and images bound in descriptor sets can be sparse. If the descriptor set is not bound via DGC (because of the bug this is not possible atm) but outside, sparse images in the descriptor set work as expected.

Regards

I just tested the new 382.68 beta driver, and i can confirm that both bugs are fixed in this version and that my program runs like a charm! Thank you very much for fixing this stuff! I can also confirm that sparse images in descriptor sets bound via DGC work as expected. This extension is really nice, and after some of the spec issues are ironed out i hope other vendors will take this up.

Regards

Just a tidbit:

Beta driver 382.81 has just been released.
It requires to rebuild the gl_vk_threaded_cadscene sample against Vulkan SDK 1.054 to avoid triangles of death.