Fluctuating performance with two instances of NVENC

I encode AV1 video using NVENC on an RTX 4090, which has two hardware NVENC units.

When running a single instance of the encoder, I achieve a consistent performance of ~25fps. When running two instances of the encoder simultaneously (in hopes of making better use of the available hardware), I sometimes get a cumulative performance of ~50fps (which is what I would expect, since the card has two NVENC hardware units), but often the cumulative performance is much worse, somewhere between 25fps and 40fps.

I can’t find a reason for why the performance fluctuates so wildly in the dual encoder case. Running the same program several times gives me anything between 25fps and 50fps of total encoding speed. Nsight shows that the program keeps the two encoders busy at all times (nvEncEncodePicture calls show up back-to-back in the timeline without gaps in-between), so it can’t be due to not feeding input data fast enough.
I would be very grateful for any help here.

According to Video Encode and Decode GPU Support Matrix | NVIDIA Developer

The 4090 has 2 nvenc engines and allows 8 concurrent sessions.

Did you do an experiment where the video to encode are different, in particular, with different complications ?

Did you observe that the slow down is happening at the same moment of the video or is it “random” (different every time you encode the same videos) ?

I think it may be due to the video content, which may give more of less workload to the nvenc engines.