Hi, I’m developing a game streaming / remote desktop tool for linux using Vulkan, wayland, and Vulkan Video. My main development environment is using an AMD card, and my tool works well there. I’ve recently been testing on a server with an RTX 3080, and hit two major issues. One is related to video encoding using Vulkan Video, and one is related to explicit sync.
I’ve set up a branch of my project to make it easy for other developers to reproduce the issues. I’ve included instructions for getting it running at the bottom of this post.
Testing environment:
- OS: Ubuntu 23.10, linux 6.5.0-25
- Driver: linux beta 550.40.59
- GPU: GeForce RTX 3080
Thanks so much in advance!
1. Video Encoding artifacts
This issue is related to the output I get when using Vulkan Video to encode the application textures.
Using both h264 and h265, I get encoding artifacts when using forward references (P frames). Here is an example of h264 output using a simple IPPP structure:
Here is an example of h265 output using the same IPPP structure:
Note that I can work around the problem by disabling references, i.e. emitting only IDR frames. This is obviously not ideal. Note also that FFmpeg’s vulkan implementation currently does the same thing, although I’m not sure if that’s just because they haven’t implemented forward references yet.
2. Explicit sync issues
My application acts as a wayland compositor. It uses dmabuf with explicit sync to import application textures into vulkan.
Here is where I import the dmabuf as a vulkan semaphore and here is where I wait on it.
What I’ve found is that imported textures above a certain size are always all black. For example, running vkcube-wayland with --resolution 800x640 always works, but --resolution 800x642 (or taller) is always an all-black texture. I haven’t been able to discern any pattern in the problematic resolutions (I also examined stride) of the surface texture, but for every width there exists a height that triggers the bug, and for every height there exists a width that triggers the bug.
I’ve verified that it’s the dmabuf texture that’s all zeroes, not the encoded output, by dumping out the blend texture from my compositer code. It’s noteworthy that the bug persists whether I’m using xwayland or not.
I’m not hitting any validation errors[1], and I’ve compared my implementation to the few existing implementations of explicit sync I’ve seen (for example, the wlroots vulkan renderer). And obviously, I’m not hitting this bug on Mesa on my AMD card. I’ve also been unable to reproduce the issue on a cloud instance with a P1000 card. I don’t have a lot of nvidia cards to test on, though.
Running the code
I’ve prepared a branch with hardcoded config that makes it somewhat faster to reproduce the issues.
First, make sure vkcube and vkcube-wayland are on your PATH (they both came with the SDK for me). Then you can clone the branch:
git clone --branch nvidia-testing https://github.com/colinmarc/magic-mirror
To run the server:
cd mm-server && cargo run --bin mmserver
And then to run the client, in a separate terminal:
cd mm-client && cargo run --bin mmclient -- localhost:9599 vkcube-wayland --resolution 800x600 --codec h264
Changing vkcube-wayland to vkcube will allow you to run via xwayland. The other arguments should be self-explanatory. Adding --bug-report will cause the server to dump the compressed bitstream out to a temp dir, which can be useful for analysis and to isolate out any client-side problems.
I actually have one validation error, which I’ve posted separately about, that I am 99% sure is unrelated and spurious. ↩︎