NVENC H.264 Encoder MFT latency increases when framerate is limited

Hello!

I am developing an application for Windows that utilizes low-latency display streaming. I want the application to be platform-agnostic (as far as GPUs go), so I am not using the H.264 hardware encoder directly through the NVIDIA Video Codec SDK but through a Media Foundation Transform instead. For display capture I am using the Windows Desktop Duplication API.

On my development machine I have a GTX 1050 Ti, which is able to encode a 1080p (-ish, actually 1920x1200) stream with around a 3 ms latency at best. However, after a while the encoder seemingly decides that since it is only being fed samples every 16 ms, it doesn’t need to encode them at the regular rate. Perhaps this leads to better encoding quality or lower power consumption or something.

To test this, I set up a switch which would make the application stop collecting frames through the Desktop Duplication API and instead just feed the same frame over and over again as fast as it could into the encoder. During this time the encoding latency decreases dramatically. After returning to normal operation of capturing the display every 16 ms and sending the frame to the transform, the encoder will keep processing the frames at a lower latency for a while, but eventually returns to the higher latency.

I’ve illustrated the problem with the following graph. I’m guessing the spikes must be I-frames, which take slightly longer to encode. The green area of the graph indicates a section where the encoder was artificially saturated.

http://i.xomf.com/msbnx.png

I wonder if this has to do with not having set the low latency preset on the encoder? Is there a way to instruct the encoder to use a low-latency mode through Media Foundation? I have tried changing some Media Foundation attributes (such as MF_LOW_LATENCY) as well as properties exposed by the ICodecAPI interface of the transform, but none has had an effect so far.

A 12 ms latency is not something I cannot live with, especially considering that systems with lower-end GPUs will probably see a higher latency anyway. Nevertheless, it would be nice to have a confirmation on whether anyone else has observed this behaviour or if anything can be done about it.

Hi KeloCube,

Could you provide a detailed info of:

  1. Driver version
  2. Operation System

Thanks,
Ryan Park

Hi,

Thanks for the response! Silly of me to not include them right away. I’m running Windows 10 Pro 64-bit (10.0, Build 17134) with driver version 24.21.13.9882. I have GeForce Experience installed, which reports the version as 398.82. See full DxDiag output as an attachment. Hopefully there’s nothing too sensitive there. :)

At the moment the integrated GPU is enabled (I was testing the Quick Sync implementation of h.264 encoding), but I was having the same problem with Nvidia’s encoder before enabling the iGPU.

I also put together a sample program that only does encoding of an empty surface. No desktop capture, no color conversion, just synchronous surface encoding. The same behaviour is still observed, although it takes the encoder much longer to start increasing the encoding latency and the encoder responds quicker to unthrottling of framerate. Average encode latency on my machine was about 2 ms, which is still far greater than what I got by modifying the native Encode SDK encoding sample to not read data from a file. This got me sub-millisecond latencies, though the surface being empty might have had something to do with it.

I figured the sample might be a useful starting point for anyone interested in MFT hardware encoding, so I hosted it on GitHub gists.

Also, I realized that the increasing latency in my desktop streaming app is going to be a problem after all since the Desktop Duplication API only sends a frame when the surface changes. This means the encoder will receive frames at an even lower rate, increasing the latency to above 16 ms.
DxDiag.txt (103 KB)

Hi, any update on the matter? If there is no chance of this getting resolved then please let me know. Writing an Nvidia-specific implementation is certainly a possibility, but I would like to explore this option first.

Hi,

We filed a bug on the issue you reported. Internal engineers are working on a issue repro now. I’ll update you once progress is made.

Thanks,
Ryan Park