vDWS Output Bandwidth to End User

JamesAJamesA · July 5, 2019, 9:57pm

In a document published by Nvidia titled, “NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION” (searchable file name 185532_Nvidia_Quadro_vDWS_SolutionOverview_NV_US_WEB.pdf) it is stated on Page 3 under heading NVIDIA Quadro vDWS Features that the Maximum Hardware Rendered Display is Four 4K at 4096x2160 resolution.

When that information is processed by the Quadro vDWS and pushed out to the user, whether local or remote, what is the bandwidth of the Four 4K data after it leaves the server? What refresh rate? And how is that data measured… i.e. pps (packets per second)?

anon56509511 · December 31, 2019, 6:55pm

NVIDIA finally tell the truth about “Turing” generation encoder (“Pascal” is about 50%-100% faster in low latency scenarios due to two encoders on chip, see https://developer.nvidia.com/nvidia-video-codec-sdk hidden under “Additional Performance Results”):

FYI: Be careful of new “NVIDIA vGPU Software 10” (with Linux 440.43 driver). It is 8k resolution release with “NVIDIA engineered” limits (see https://gridforums.nvidia.com/default/topic/258/nvidia-virtual-gpu-technology/documentation-for-vgpu-configs/post/16127/#16127). There are “changes” in low latency encoder behavior. The problem is with low-bandwidth and low-framerate transfers. For example if you press “any key” the output can be delayed up to 12 frames in decoder (this is 2 seconds with 6 FPS) (tested on RaspebrryPI hardware OMX decoder). This is very bad UX !

Older drivers marks NVenc encoded h264 (with NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID) with following SPS/VUI (from video stream analyzer):

...
 <b>num_ref_frames : 1 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 1 </b>
   motion_vectors_over_pic_boundaries_flag : 1 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 1 
   log2_max_mv_length_vertical : 1 
   num_reorder_frames : 0 
   <b>max_dec_frame_buffering : 1 </b>
...

New drivers with the same binary using NVenc:

...
 <b>num_ref_frames : 3 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 0</b> 
   motion_vectors_over_pic_boundaries_flag : 0 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 0 
   log2_max_mv_length_vertical : 0 
   num_reorder_frames : 0 
   max_dec_frame_buffering : 0 
...

So decoder is allowed to buffer output of decoded frames. RaspberryPI uses 9-12 additional buffer frames and this is not modifiable with OMX_IndexParamImagePoolSize !

Now you must explicitly enable “bitstreamRestrictionFlag” and set “numRefL0” on encoder side to rollback to old low-latency encoder-decoder behavior (and use headers from new CodecSDK).

...encodeCodecConfig.h264Config.h264VUIParameters.bitstreamRestrictionFlag = 1;
...encodeCodecConfig.h264Config.numRefL0 = NV_ENC_NUM_REF_FRAMES_1;

PF 2020 !

Topic		Replies	Views
Output Bandwidth Created by vDWS DirectX, DXR, DirectCompute	0	663	July 8, 2019
Number of concurrent transcode sessions on Quadro RTX4000 @1080p 30FPS General Topics and Other SDKs performance	5	3574	October 15, 2020
NVENC - Encoder Informations Tesla Boards	5	7628	December 22, 2017
Encoding multiple video limited to 2 encodes CUDA Programming and Performance	8	8035	December 19, 2016
Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?) GPU-Accelerated Libraries	25	33875	February 26, 2018
Quadro P4000 encoder count Video Processing & Optical Flow	12	10855	July 9, 2018
How many concurrent streams can Quadro RTX4000 transcode (e.g. 1080p, 30FPS, H264)? CUDA Programming and Performance	1	5808	August 26, 2020
GPU with Maximum number of video trans coding with single session DeepStream SDK	4	1098	October 12, 2021
Latency issue: nvv4l2h265enc accumulates four images before releasing the first Jetson AGX Xavier encoder	7	2477	March 3, 2022
NVIDIA Multi-Channel Encoding Card? General Topics and Other SDKs	5	2188	October 12, 2021

vDWS Output Bandwidth to End User

Related topics