Two bug reports using Vulkan Video on linux

Hi, I’m developing a game streaming / remote desktop tool for linux using Vulkan, wayland, and Vulkan Video. My main development environment is using an AMD card, and my tool works well there. I’ve recently been testing on a server with an RTX 3080, and hit two major issues. One is related to video encoding using Vulkan Video, and one is related to explicit sync.

I’ve set up a branch of my project to make it easy for other developers to reproduce the issues. I’ve included instructions for getting it running at the bottom of this post.

Testing environment:

  • OS: Ubuntu 23.10, linux 6.5.0-25
  • Driver: linux beta 550.40.59
  • GPU: GeForce RTX 3080

Thanks so much in advance!

1. Video Encoding artifacts

This issue is related to the output I get when using Vulkan Video to encode the application textures.

Using both h264 and h265, I get encoding artifacts when using forward references (P frames). Here is an example of h264 output using a simple IPPP structure:

Here is an example of h265 output using the same IPPP structure:

Note that I can work around the problem by disabling references, i.e. emitting only IDR frames. This is obviously not ideal. Note also that FFmpeg’s vulkan implementation currently does the same thing, although I’m not sure if that’s just because they haven’t implemented forward references yet.

2. Explicit sync issues

My application acts as a wayland compositor. It uses dmabuf with explicit sync to import application textures into vulkan.

Here is where I import the dmabuf as a vulkan semaphore and here is where I wait on it.

What I’ve found is that imported textures above a certain size are always all black. For example, running vkcube-wayland with --resolution 800x640 always works, but --resolution 800x642 (or taller) is always an all-black texture. I haven’t been able to discern any pattern in the problematic resolutions (I also examined stride) of the surface texture, but for every width there exists a height that triggers the bug, and for every height there exists a width that triggers the bug.

I’ve verified that it’s the dmabuf texture that’s all zeroes, not the encoded output, by dumping out the blend texture from my compositer code. It’s noteworthy that the bug persists whether I’m using xwayland or not.

I’m not hitting any validation errors[1], and I’ve compared my implementation to the few existing implementations of explicit sync I’ve seen (for example, the wlroots vulkan renderer). And obviously, I’m not hitting this bug on Mesa on my AMD card. I’ve also been unable to reproduce the issue on a cloud instance with a P1000 card. I don’t have a lot of nvidia cards to test on, though.

Running the code

I’ve prepared a branch with hardcoded config that makes it somewhat faster to reproduce the issues.

First, make sure vkcube and vkcube-wayland are on your PATH (they both came with the SDK for me). Then you can clone the branch:

git clone --branch nvidia-testing

To run the server:

cd mm-server && cargo run --bin mmserver

And then to run the client, in a separate terminal:

cd mm-client && cargo run --bin mmclient -- localhost:9599 vkcube-wayland --resolution 800x600 --codec h264

Changing vkcube-wayland to vkcube will allow you to run via xwayland. The other arguments should be self-explanatory. Adding --bug-report will cause the server to dump the compressed bitstream out to a temp dir, which can be useful for analysis and to isolate out any client-side problems.

  1. I actually have one validation error, which I’ve posted separately about, that I am 99% sure is unrelated and spurious. ↩︎

Just to be precise, I’m using the implicit-explicit sync interop kernel API, not the brand new (as of two weeks ago) DRM syncobj api… but I will be implementing the latter as soon as driver support is available.

I’ve been able to address both bugs.

The video corruption issue was caused by an incorrect image barrier when accessing the DPB images for references. I was synchronizing access to the DPB buffer correctly, but I was using a srcLayout of UNDEFINED, which was causing the driver to do some sort of initialization step on the image, discarding the contents (I think).

The all-black dmabufs were fixed by using a dedicated memory allocation for the imported dmabuf (VkMemoryDedicatedAllocationInfo). I have no idea why, really.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.