Comparing Kepler and Maxwell maximum H.264 Streams

Hi All,

I am trying to compare the number of simultaneous H.264 streams between GRID 1.0 and GRID 2.0.

The below link says that, “Each Kepler GPU has one NVENC hardware encoder that is capable of supporting up to 6 H.264 high-quality 720p at 30 fps.”

https://developer.nvidia.com/sites/default/files/akamai/gamedev/grid/grid22/grid-sdk-faq.pdf

The three blogs below all note that the M60 and M60 cards support 18 and 36 simultaneous 1080p30 streams.
http://www.thinclient.net/blog/?p=632
http://www.anandtech.com/show/9574/nvidia-announces-grid-20-tesla-m60-m6-grid-cards
http://www.poppelgaard.com/nvidia-grid-2-0

I want to verify if the below calculations of stream count are correct and if the Kepler NVENC is only measured in 720p30? I know from http://support.citrix.com/article/CTX201696 that 4k monitors are supported on a K2 passthrough via a Citrix HDX session. So, that’s just an example that the Kepler NVENC can support over 720p resolution. Therefore, I am wondering if there is an apples to apples comparison I can make between Maxwell NVENC and Kepler NVENC. In other words, are there any specs for the number of 1080p30 streams that Kepler NVENC can support?

K1: 4 GPUs x 6 H.264 streams = 24 simultaneous 720p30 H.264 streams
K2: 2 GPUs x 6 H.264 streams = 12 simultaneous 720p30 H.264 streams
M6: 1 GPU x 18 H.264 streams = 18 simultaneous 1080p30 H.264 streams
M60: 2 GPU x 18 H.264 streams = 36 simultaneous 1080p30 H.264 streams

Many Thanks!

Richard

Hello.

You should check NVENC API directly - https://developer.nvidia.com/nvidia-video-codec-sdk
There are FPS performance tests (1280x720) for Kepler (K1/K2) and Maxwell gen2 (M6/M60) - https://developer.nvidia.com/application-note.
I am using K1 NVENC for coding 1080p@30 and 1280x1024@40 (and tested up to maxres 2560×1600).
1280x1024@40 works at 5 x H.264 streams on K1 (on one GK107 GPU) with NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID & NV_ENC_PARAMS_RC_CONSTQP … Performance depends on encoder parameters as written in appnote.

M.C>

Thanks MC! That’s good info. I am writing an article to compare GRID 1.0 and 2.0. I am looking for an official spec from NVIDIA and am hoping I can find what the Kepler NVENC is rated at in 1080p30.

Cheers,

Richard

Thanks MC! That’s good info. I am writing an article to compare GRID 1.0 and 2.0. I am looking for an official spec from NVIDIA and am hoping I can find what the Kepler NVENC is rated at in 1080p30.

Cheers,

Richard

Hi Richard,

Our team think you might need to read:
http://developer.download.nvidia.com/assets/cuda/files/NVENC_DA-06209-001_v07.pdf?autho=1455894895_6041145c91c9550a8bb7727328e9c815&file=NVENC_DA-06209-001_v07.pdf

Does this cover it? Please get back to us if there are improvements needed or more info needed!
Rachel

Kepler - 3 streams 1080p @ 30fps

Maxwell - 18 streams 1080p @ 30fps

In simple terms, Maxwell has 6x the capability of Kelper for NVENC.

Maxwell also introduces 4.4.4 encoding as an option and h265.

As Mcerveney points out, there’s a lot of options that affect throughput, so reading the guide above is worthwhile.

Hey there Rachel and Jason! I am glad to get a response from the A-team!

The article was helpful and thanks for confirming the 1080p30 specs.

I believe that XenDesktop’s ICA protocol is encoded into H.264 by the CPU. I recall that around May of 2015, encoding the ICA stream into H.264 on the GPU was in an experimental phase and should be released by Citrix in the future.

I pasted an excerpt below from this article on Feb 6, 2016: http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-6/xad-hdx-landing/xad-hdx3dpro-gpu-accel-desktop.html

"HDX 3D Pro offers the following features:
Adaptive H.264-based deep compression for optimal WAN and wireless performance. HDX 3D Pro uses CPU-based deep compression as the default compression technique for encoding. This provides optimal compression that dynamically adapts to network conditions.

The H.264-based deep compression codec no longer competes with graphics rendering for CUDA cores on the NVIDIA GPU. The deep compression codec runs on the CPU and provides bandwidth efficiency."

The excerpt confirms that XenDesktop encodes H.264 in the CPU. I am a little confused by this line though, “The H.264-based deep compression codec no longer competes with graphics rendering for CUDA cores on the NVIDIA GPU.” It seems to suggest that encoding previously did happen on the GPU and now it’s back on the CPU. Can you clarify this?

Would the graphics rendering occur on the “3D” engine and the encoding occur on the NVENC engine? If so, it doesn’t sound like encoding and rendering would have been stepping on each other on the GPU anyway. Is this accurate?

It also looks like now, Horizon View does use the GPU to encode the graphics stream based on this excerpt from the below article. "Now, using the new Blast Extreme protocol, NVIDIA GRID offloads encoding from the CPU to the GPU."

http://blogs.nvidia.com/blog/2016/02/09/nvidia-grid-blast-extreme-vmware-horizon/

Thanks for your input guys!

Richard

Citrix did write a CUDA based encoder (XD 5.6 IIRC), but it wasn’t optimal and didn’t use NVENC, so it was dropped.

Correct. 3D engine and NVENC are separate components on the GPU, so load on one will not affect the other.

Correct again, it’s using NVENC to achieve this and is supported on all GRID platforms, though as you’ll have realised, the Maxwell series will have better scalability.

Jason,

Can you point me to the documentation where this out-lined? A link and pointer to the documentation?
I can’t find this myself and would be useful to record.

Best wishes,
Rachel

Kepler - 3 streams 1080p @ 30fps

Maxwell - 18 streams 1080p @ 30fps

In simple terms, Maxwell has 6x the capability of Kelper for NVENC.

Maxwell also introduces 4.4.4 encoding as an option and h265.

Thanks Jason! Great info. I redid my calculations from above which may help some other readers.

K1 Card: 4 GPUs x 3 H.264 streams per Kepler GPU = 12 simultaneous 1080p30 H.264 streams
K2 Card: 2 GPUs x 3 H.264 streams per Kepler GPU = 6 simultaneous 1080p30 H.264 streams
M6 Card: 1 GPU x 18 H.264 streams per Maxwell GPU = 18 simultaneous 1080p30 H.264 streams
M60 Card: 2 GPU x 18 H.264 streams per Maxwell GPU = 36 simultaneous 1080p30 H.264 streams

Cheers,

Richard

Thanks Jason! Great info. I redid my calculations from above which may help some other readers.

K1 Card: 4 GPUs x 3 H.264 streams per Kepler GPU = 12 simultaneous 1080p30 H.264 streams
K2 Card: 2 GPUs x 3 H.264 streams per Kepler GPU = 6 simultaneous 1080p30 H.264 streams
M6 Card: 1 GPU x 18 H.264 streams per Maxwell GPU = 18 simultaneous 1080p30 H.264 streams
M60 Card: 2 GPU x 18 H.264 streams per Maxwell GPU = 36 simultaneous 1080p30 H.264 streams

Cheers,

Richard

https://developer.nvidia.com/nvidia-video-codec-sdk

Though it’s not laid out as simply as above.

Two more questions for you:

Does NVENC sit idle in XenDesktop environments or is it leveraged in some other way?

Are the M6 and M60 cards both using the second generation of Maxwell?

Thanks!

Richard

When it’s not in use, it’s idle, though with multiple VM’s calling on it, the load will get spread, just like in the 3D engine (scheduler covers all engines).

Gen 1 Maxwell was only on the GeForce boards.

M60/6 are GM204 (the 2 indicating second gen.)

Thanks Jason!

What I want to understand is that if XenDesktop uses the CPU to encode graphics, does NVENC sit idle all the time in a XenDesktop-only environment. Or, are there other tasks besides encoding into ICA/HDX that NVENC is leveraged for?

Cheers,

Richard

Thanks Jason!

What I want to understand is that if XenDesktop uses the CPU to encode graphics, does NVENC sit idle all the time in a XenDesktop-only environment. Or, are there other tasks besides encoding into ICA/HDX that NVENC is leveraged for?

Cheers,

Richard

NVENC on the GPU is available to any application to make use of it via the API’s detailed in the SDK. If there’s no applications making any calls on it, then it sits idle.

It’s really dependant on whether you have other applications running that will make use of it to encode h264/5.

Hi All,

I have another obscure question in addition to the one above.

I noticed something when looking at the power limits in the “nvidia-smi -q” output. If you double the “Max Power Limit” for the two GPU’s on the M60 board, it equals above the published maximum. For instance the M60 would be 162W x 2 = 324W. The published maximum for the board is 300 W.

Is this per-GPU maximum power limit to allow the GPU to enter its “boost” mode temporarily? If so, I assume that boost mode could only be entered in the case that the other physical GPU is far enough below its maximum power draw not to exceed the published limit for the entire card? Is this accurate?

Thanks!

Richard