Max nvenc parallel sessions supported by Tesla T4?

Hi, guys,

As declared above, maximum 24 streams of 1080p 30fps nv12 can work in realtime in parallel.
But I failed to reproduce the result using NVIDIA VIDEO SDK SAMPLE from GitHub - NVIDIA/video-sdk-samples: Samples demonstrating how to use various APIs of NVIDIA Video Codec SDK.

Here is my configuration:

  1. HEVC codec
  2. LowLatency and NV_ENC_PRESET_LOW_LATENCY_HQ_GUID preset, leaving other parameters defaultly.
  3. using input D3D11 texture
  4. Forcing NV12 input format
  5. Tesla T4 with latest driver in Windows x64 platform
  6. one encoder session per process (namely AppEncD3D11.exe)
  7. input Clip: Basketball (500frames 1080p) from ITU Official streams
  8. the process encode one frame and sleep 33 ms (30fps mostly)

Here is my result:
Case1: 10 streams working in parallel, I get 80% encoder usage, and encode one frame cost 0-1ms
Case2: 15 streams working in parallel, I get 98% encoder usage, and encode one frame cost 0-22ms
Case3: 20 streams working in parallel, I get 98% encoder usage, and encode one frame cost 0-46ms

In case 3, the encoder cannot garauntee the realtime perfermance(30fps) any more.

Is there something I missed ??

Hello @reis.zm307 and welcome to the NVIDIA developer forums!

The performance values on that page are benchmark numbers. To find out how those numbers are measured and reached, please find this detailed document describing the complete setup.

Any difference in setup will create different performance results. As with any benchmark, the 24 streams are the maximum possible number, but depending on the environment not necessarily a guaranteed number.

I am really sorry if this causes trouble with your project.

I do hope that this information was still helpful for you.

Thanks!

Hello, MarkusHoHo, Thanks for the quick response.

I carefully review the detailed document describing the complete setup. and find some comments about the evaluation of the result (Last section of the document), says “If multiple files are being encoded in parallel, then the aggregate number of frames in all parallel encoded videos are used to compute performance in frames/second.” It might be the key of the difference.
If reconsidering my experimental results in that way (Averaging all frames in all processes), I find they match quite well.
So, can I conclude that, for the realtime multi-encoder application, we would not achieve 24 streams in parallel ( In my case, properly 15 streams), because some of streams cannot guarantee encoding one frame in 33ms any more.
Is that right??

Here is another question about the encoder usage.
Regardless the performance, in my cases, 12 streams, 15 streams, 20 streams multi-encode session produce the same encoder usage 98%, that is a very high usage number in the practical application. If I want to deploy Tesla T4 to our production environment, how can I build up the encoder usage surveillance? Or, just regard it. And how many streams in parallel you suggest would be the best balance of the encoder usage and performance?

Thank you

I must admit that this goes beyond my experience with NVENC, so let me see if can get someone with more background knowledge to chime in here. This might take a bit, I appreciate your patience!

Thanks MarkusHoho,
Pls let me know if you have any further information.

Hi again,

First of all, your assessment why you cannot achieve 24 parallel streams seems correct for this scenario.

I also received the recommendation that for better comparison of performance numbers you should take a look at the current usage of encoder presets as described in the Hardware Video Encoder Documentation. Under “Performance” you will find presets P1 to P7 which replace the previous way of specifying encoder settings. In your case to reach as many parallel streams as possible you should use P1.

Lastly you need to decide yourself if you want to under-utilize GPU or not. 98% utilization is not a bad thing. This indicates that the encode engines are being fully utilized. Any lesser number would indicate that encode engines are not being saturated. So probably you should benchmark your system and chose the highest number of parallel streams the system can sustain over longer periods of time.

Hopefully this will help with your project.
Thanks!

Thanks, MarkusHoho.
I will upgrade video sdk and try the new presets (aka, p1-p7) and the AppEncPerf application in SDK samples.
By the way, what is the exact meaning of the figures in Table 3. NVENC encoding performance Hardware Video Encoder Documentation? fps or timing?

Thank you!

1 Like

The unit used is Frames encoded per second. I mentioned that to the people writing the documentation, it should be mentioned in some future version.

Thanks!