Unexplained / Strange behavior on Linux vs. Windows

Hello,

I’m witnessing a strange behavior on Linux (Ubuntu 20.04) that is decidedly different from what I see on Windows on the same machine.
Windows: Windows 10, CUDA 11.6, Driver 510
Linux: Ubuntu 20.04, CUDA 11.5, Driver 495

I have a rendering loop that:

  1. Copies data host->device
  2. Runs 2 OpenCV kernels (DeBayer and remap)
  3. Maps an OpenGL texture (cudaGraphicsMapResources)
  4. Copies data device->array (into the OpenGL texture)
  5. Unmaps the OpenGL texture

It does 1-5 twice (I have two images, left and right), in two separate streams.

  1. Synchronizes the device
  2. Renders using the textures (OpenGL)
  3. Swaps buffers (OpenGL)
  4. Back to 1.

On Windows it works rather fine:


As you can see there are some CUDA API calls which are immediately followed by 2 streams, 17 and 18, each doing a host->device copy, 2 kernels, and a device->array copy (For some reason the copies and the kernels are not overlapped, but I don’t care about that now). The cudaDeviceSynchronize waits patiently for the last device->array copy to complete, then there are some OpenGL API commands followed by some OpenGL HW work, and basically it wait there for the vertical sync.
It behaves exactly as I expect, and the texture is refreshed, as expected.

On Linux:


It’s a bit hard to see, but the first frame seems to be doing exactly the same thing as Windows, but then, from 2nd frame onward something strange happens: there is an extremely long wait between the second kernel invocation and the device->array copy, with cudaDeviceSynchronize patiently waiting the entire time. Furthermore, the texture is updated exactly once (at the beginning), and never updates afterwards. Could it be that the cudaGraphicsMapResources is waiting for the render to complete (i.e. waits for the Vsync)? Sounds extremely odd, since the rendering to the back buffer has completed a long time ago (see the OpenGL HW row), what would be the rationale to wait for the Vsync?
What am I missing here? And why is there no update to the texture?
Here’s the region between frame 2 and frame 3 zoomed in: