External monitor freezes when using dedicated GPU

Further investigation shows, that if I run

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only glxgears &
$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only vkcube &

resize glxgears window to make it larger, place vkcube window over lower-right angle of glxgears window and start vkcube windows resizing external monitor instantly freezes with the latest driver version (545.29.02).

Along with freezing vkcube crashes after I enter text console and return to GUI to unfreeze monitor:

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7cc7d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7c78f32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7c63472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7c63395 in __assert_fail_base (fmt=0x7ffff7dd7a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555555560012 "!err", 
    file=file@entry=0x555555560004 "./cube/cube.c", line=line@entry=744, function=function@entry=0x555555562850 <__PRETTY_FUNCTION__.1> "demo_flush_init_cmd")
    at ./assert/assert.c:92
#5  0x00007ffff7c71e32 in __GI___assert_fail (assertion=assertion@entry=0x555555560012 "!err", file=file@entry=0x555555560004 "./cube/cube.c", 
    line=line@entry=744, function=function@entry=0x555555562850 <__PRETTY_FUNCTION__.1> "demo_flush_init_cmd") at ./assert/assert.c:101
#6  0x000055555555f490 in demo_flush_init_cmd (demo=0x7fffffffcd00) at ./cube/cube.c:744
#7  demo_prepare (demo=demo@entry=0x7fffffffcd00) at ./cube/cube.c:2389
#8  0x000055555555f724 in demo_prepare (demo=0x7fffffffcd00) at ./cube/cube.c:2549
#9  0x0000555555559ca8 in demo_resize (demo=0x7fffffffcd00) at ./cube/cube.c:1094
#10 demo_handle_xcb_event (event=0x555555eff760, demo=0x7fffffffcd00) at ./cube/cube.c:2785
#11 demo_run_xcb (demo=0x7fffffffcd00) at ./cube/cube.c:2805
#12 main (argc=<optimized out>, argv=<optimized out>) at ./cube/cube.c:4396

Failed code is:

    err = vkQueueSubmit(demo->graphics_queue, 1, &submit_info, fence);
    assert(!err);

So vkcube fails to submit command due to some errors in queue processing (queue stucked?).

Before monitor freezed I’ve monitored /sys/kernel/debug/dma_buf/bufinfo file to check shared DMABUF are updated in working state and they are frequently updated as expected:

08388608        00000002        00080007        00000006        i915    00000062        <none>
        write fence:0000:00:02.0 signaled seq 518698 signalled
        Attached Devices:
        0000:01:00.0
Total 1 devices attached

08388608        00000002        00080007        00000006        i915    00000061        <none>
        write fence:0000:00:02.0 signaled seq 518702 signalled
        Attached Devices:
        0000:01:00.0
Total 1 devices attached

00004096        00000002        00080007        00000003        i915    00000050        <none>

and after one second:

08388608        00000002        00080007        00000006        i915    00000062        <none>
        write fence:0000:00:02.0 signaled seq 518886 signalled
        Attached Devices:
        0000:01:00.0
Total 1 devices attached

08388608        00000002        00080007        00000006        i915    00000061        <none>
        write fence:0000:00:02.0 signaled seq 518884 signalled
        Attached Devices:
        0000:01:00.0
Total 1 devices attached

00004096        00000002        00080007        00000003        i915    00000050        <none>

note, that seq 518886 numbers are rapidly increased.

But if external monitor was freezed, this numbers are not incremented at all. 0000:00:02.0 is Intel’s iGPU PCI-E device, 0000:01:00.0 is NVIDIA’s discrete dGPU device. Physically they are performing well, but their DMABUF buffers, related to external monitor framebuffer (?), are not updated due to some error in direct rendering stack of Xorg (driver or something similar) or in some command queue processing issue (race condition?).

I hope my investigations will help NVIDIA to fix this issues in future driver releases as soon as possible.

3 Likes