In my application I need to encode two video streams in parallel. I examine the performance with visual profiler and I notice a peculiar thing: I can see that there are two distinct CPU and two distinct GPU encoder queues. I can see that the GPU queues start nearly simultaneously (the frames become ready within 500 us from one another), they encode the same frame size, but the first frame is done and ready after 5.2ms (I look at the Dma Packet entry in the Video Encode GPU, which is the only entry in that queue in order to determine when the encoding started and ended. It coincides exactly with the Render entry in the Video Encoder CPU queue), while the second one takes 9.6ms. If I check the overlap between them (when both of them appear to be active in the profiler) I see that the overlap is 4ms. So now if I subtract 9.6ms-4ms=5.6ms ~ 5.2ms, i.e. it looks like the second encoder waited for the first one to complete before it started working. Is it supposed to be this way or am I missing something?
I have an RTX A6000, encoding two 3200x3200 streams at 30fps.