Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?)

Hi,

I’m using NVENC in a big realtime streaming solution where having multiple parallel streams is a key factor - so far I’ve successfully built it on top of Intel Quick Sync but as the Intel GPUs cannot be stacked since they are embedded into the die, I began looking into NVENC. I’ve successfully ported the solution to NVENC and made a proof of concept on a Tesla k40c card but since it is very expensive I wanted to understand if cheaper NVIDIA cards can be used for this.

I looked at GeForce GTX 750 TI but as the NVENC SDK homepage and PDF documentation states it only allows running 2 sessions - initialization of each session beyond second one fails (as expected from the documentation) on both Linux and Windows.

The statement “NVIDIA Quadro K4000 and above” seems quite unclear - I’d like to use a Maxwell based GPU because of the performance improvements over Kepler cards, but at least looking at the Wikipedia list of NVIDIA GPUs (http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units) it seems that the Maxwell GM107 chip is only present in GeForce cards and Quadro K2200. Is Quadro K2200 considered to be above or below K4000 in the context of NVENC? Does is support more than 2 sessions? If not, does it mean that currently there are no Maxwell based cards able to run 2+ encoding sessions?

Ditto! I’m also wondering this! The documentation isn’t quite clear if the Geforce Maxwells support more than two encoding sessions on Linux (the documentation seems to emphasize that there is to be no more than two on Windows).

We have a system with two Geforce 690s (which means 4x GPUs), but we still can only create two encoding sessions. So apparently the limitation of encoding sessions is not per GPU, but per system. Yikes! So much potential, but yet so far away from using it…

Anyone have any answers to these two comments out there?

Hi,

I’m also trying to figure it out. It turns out that only Maxwell chip that allows us to create more than two encoding sessions is GM200 and only if it is integrated in a rather expensive card M6000?

Can someone from NVIDIA answer our comments?

Yes, GeForce products that are Kepler or Maxwell based are limited to two simultaneous video streams per system using NVENC hardware. Maxwell 1st generation GPUs (GM10x) improve in performance and additional features in NVENC. The Quadro K2200 (GM107) GPU which replaces the Quadro K2000 (GK107). Quadro products (Kepler or Maxwell) can encode more than 2 video streams. I assume you are targeting real-time performance (30fps), so the limit is going to be GPU clocks and also memory available. You can find specs about the Quadro K2200 here.

Quadro K4000 (GK104) and K4200 (GK104) both more GPU cores and memory bandwidth than K2200. K4200 will be about 1.75X faster in 3D or Compute than a K4000, but NVENC is the same for both. In comparison to the a Quadro K2200 (GM107), but if you compare the to H.264 encoding performance the Quadro K2200 will be faster than K4000/K4200.

Quadro M6000 (GM200) has even GPU cores and bandwidth and it is a 2nd generation Maxwell GPU with improvements in the NVENC engine over the 1st generation Maxwell. A single Quadro M6000 has 2 NVENC engines, and in addition it adds support for HEVC (H.265) hardware encoding.

If you want Maxwell GPUs with the best NVENC performance, K2200 and M6000 are the recommended products.

For more clarification. Quadro K2200 is a 1st generation Maxwell GPU with a single NVENC engines and can support more than 2 concurrent streams at the same time. Compared to the K2000, the K2200 Maxwell NVENC has 2X the throughput. Quadro M6000 is a second generation Maxwell GPU has two NVENC engines and has 2X throughput in encoding that Quadro K2200.

How the nvenc’s performance of K2200 compared to Geforce GTX970,or 980?

Maybe I should have mentioned this before when I had the same problem of opening more than 2 encoding sessions.
But I fount out(much later) it was possible to use more than 2 sessions concurrently with Geforce GTX750 TI, but the requirement is that you will need a special license key with NVENC SDK 3(haven’t test SDK 4). You will also need to have NVIDIA driver version 334.21(the one I’m using that works, haven’t tested other drivers yet). With newer driver version, NVIDIA removed the need for a licence thus everyone can use NVENC, but they limited the number of encoding sessions.

Currently using only one GTX750 Ti for parallel encoding but it doesnt seem to have any performance boost beyond 2 encoding sessions(it was also documented in NVENC doc). But it will probably change if we use more than one GPU(2 encodings session on each card).

Hello rbundulis; how many encode sessions can be create when you choose Tesla k40c? Have you choose others CRAD that can create 2+ encode sessions?

eh~~~, Tesla K40c is so expensive, that we have not enough budget to but it to test;

Thank you very much;

Hello.

Does anyone verified in reality that Quadro K620/K1200/K2200 (GM107, maxwell 1. gen) supports more than 2 encoder streams (NVENC) and performance is equal to presented in http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf ?

Thanks, Martin

I am looking at the document linked above and I see no mention of more than two encoder streams being supported with the low-end Quadro GPUs you mention. The only relevant language I can find is in fact even more restrictive than that:

I do not see how “up to two encode sessions per system (not GPU!)” could be interpreted as “more than two encoder streams”, under any circumstances. One cannot verify features that are not even promised. Did I overlook something in the document?

1 Like

Read this thread back. it is unclear if K2200 is low-end or not (official page says that >=K4000 has 2+ encoding streams supported).
But NVidia pages are full of bugs, old information and intentional fog. The “NVENC” is the same in all chip of same type (differences exists only between kepler, maxwell gen 1 and maxwell gen 2) it is only “license” problem to enable 2+ encoding streams (there was “license” to enable 2+ encoding in Geforce range).
I have personal negative experience with information released by NVidia https://devtalk.nvidia.com/default/topic/546409/.

Martin

1 Like

I have a Quadro K2200 in my system here and its hardware specifications are basically those of a GTX 750 Ti, but with 4GB of memory. With only five SMs, it is low-end. This jibes with your information that more than two streams are supported on K4000 and up, since the Quadro K4000 is the next level up from the Quadro K2200.

I have no insights into the NVENC implementation, but as a general comment, I’ll point out that market differentiation is a valid business strategy that has been around for decades, for both hardware and software (and beyond that, e.g. DVD region codes).

Keeping documentation correct and up-to-date is pretty hard, as I know from personal experience. Unlike software, there are no debuggers and nightly regression test runs for documentation. I would suggest reporting all issues with documentation (corrections and requests for clarification) to NVIDIA via the normal bug reporting channels.

I accept (market differentiation, problematic up-to-date docs/web pages). This is the reason (not only for me) to ask this forum.

The “NVENC” is special hardware ASIC IP block (part of chip) and does not relate how many SM you have (clock gpu/memory and memory bandwidth have performance influence). https://en.wikipedia.org/wiki/Nvidia_NVENC (or previously linked pdf).

K4000 (kepler based) is now obsoleted and slower ~15% (in video encoding 2-4x slower) than K2200 (maxwell 1.gen based) :-( (K4200/K5000 obsoleted too and new M4000 is expensive (it is not known if it is has encode 2+ stream capability !)).

I mentioned the SM count as my basis for classifying the K2200 as a low-end device. I am not clear on where you intend to go in practical terms. It seems pretty clear from the available information that a K2200 does not support more than two encode sessions.

If someone wants to post source code for a small program that determines experimentally the number of available NVENC encode sessions, I’d be happy to give it a try with my Quadro K2200 on Win7 and report the results here. I have downloaded the NVENC SDK, but strangely, it does not seem to come with an installation script, it is just a plain .zip archive? I have CUDA 6.5 installed here (which should be sufficient based on the readme document that comes with the SDK), and Microsoft Visual Studio 2010.

Hello.

I think that following test is sufficient:

  1. Compile samples from NVENC SDK zip (or only NvEncoder). There are project files for VS.
  2. Run NvEncoder parallel (>2x) with sample yuv file. Use something like this run.vbs (check and correct paths):
Dim WinScriptHost
Set WinScriptHost = CreateObject("WScript.Shell")
WinScriptHost.Run "C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\bin\Release\NvEncoder.exe -i C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\YUV080p\HeavyHandIdiot.3sec.yuv -o C:\nvenc_5.0.1_sdk\x1.h264 -size 1920 1080 -devicetype 0"
WinScriptHost.Run "C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\bin\Release\NvEncoder.exe -i C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\YUV080p\HeavyHandIdiot.3sec.yuv -o C:\nvenc_5.0.1_sdk\x2.h264 -size 1920 1080 -devicetype 0"
WinScriptHost.Run "C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\bin\Release\NvEncoder.exe -i C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\YUV080p\HeavyHandIdiot.3sec.yuv -o C:\nvenc_5.0.1_sdk\x3.h264 -size 1920 1080 -devicetype 0"
WinScriptHost.Run "C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\bin\Release\NvEncoder.exe -i C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\YUV080p\HeavyHandIdiot.3sec.yuv -o C:\nvenc_5.0.1_sdk\x4.h264 -size 1920 1080 -devicetype 0"
WinScriptHost.Run "C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\bin\Release\NvEncoder.exe -i C:\nvenc_5.0.1_sdk\nvenc_5.0.1_sdk\Samples\YUV080p\HeavyHandIdiot.3sec.yuv -o C:\nvenc_5.0.1_sdk\x5.h264 -size 1920 1080 -devicetype 0"
Set WinScriptHost = Nothing

x1.h264 to x5.h264 should be created with data (NOT only empty files). I tried on GeForce and 2 files were valid but there were 5 files valid on Grid K1.

Thanks, Martin

I cannot build the projects from the NVENC SDK since the solution files are for a newer version of MSVS than I have installed, and the DirectX SDK I have installed here seems too old as well.

That is the reason why I was asking for a simple source code that can just be built with MSVS 2010 from the command line using CUDA 6.5 and NVENC SDK, ideally without using Direct X SDK. For the record, this is the version of Direct X SDK I have installed from when I first set up this Windows 7 system: DXSDK_DIR=C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\

CUDA has an API call cudaGetDeviceProperties() that retrieves pretty much all the relevant maximum supported settings for each of the attached devices. I would think NVENC offers something like that?

I downloaded MSVS 2013 Express + Cuda 6.5 @ win7 and compile it without problem. (I am not developing on win I am un*xes developer). The testing script does not need cuda @ runtime ("-devicetype 0" parameter use DX9 to access NVENC accelerator). Probably you can delete newer DX access in sources. I will try to attach compiled example.
Maybe you can use https://devtalk.nvidia.com/default/topic/868289/?comment=4645979 (OutOfMem.zip). It uses CUDA only but that is sequential test not parallel.

Something like that exists https://devtalk.nvidia.com/default/topic/861311/?#4622214 but it has no information about “license” count.

Martin
NvEncoder.zip (13.3 KB)

Sorry, but this is getting way too involved. I don’t want to spend time pulling and installing additional software. I have no personal interest in NVENC whatsoever, only CUDA. Maybe you can borrow a Quadro K2200 from someone if you absolutely have to establish whether more than two sessions are supported on that GPU and cannot afford to buy one. Have you tried contacting NVIDIA in writing, asking for clarification on the limits for the K2200.

Thanks for your time.
Waiting for NVidia answers … (5 days after send query to NVidia sales center) … (12 days after send (and resend) query to NVIDIA sales center) … (18 days after send query) … no official answers from NVidia received (including question about m4000 2+ encoders support). :-(

Hi,

sorry for abandoning this thread - however I can confirm that I am able to run up to 25 parallel NVENC sessions on a Quadro K4200. However since the total throughput is only 8x realtime (meaning 240 1920x1080 frames per second) which gives below 10 FPS per each session I am already looking to get my hands on a Maxwell v2 card that has 16x realtime performance.