Need help in choosing GPU for Video Analytics with multi-stream (4+) RTSP inputs & outputs

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version 5.0
• TensorRT Version 7
• NVIDIA GPU Driver Version (450, CUDA 11.00)

We are evaluating suitability of NVIDIA GPUs for video analytics application and have used the deepstream-app as the basis for evaluation and have added support for Nvidia Optical Flow and nvdsanalytics plugins in the pipeline . We have tested the performance on RTX 2080 Ti and Jetson (testing on Jetson has been minimal so far as it doesn’t support Optical Flow). For the purpose of testing, we disable the Optical Flow and nvdsanalytics plugins in the pipeline and limited ourselves to YoloV3 model provided by NVIDIA as part of the samples.

We need to process as many input camera streams (of 1080p x 30FPS over RTSP) as possible (say, 8+) and generate the output RTSP streams with each of the output streams carrying the detection output (OSD).

The problem with RTX 2080 Ti is it limits the number of concurrent encoder sessions to 3.

On the other hand, a Quadro or similar processor do not restrict the number of concurrent sessions. However, it is not clear how many concurrent NVENC sessions can be used on such GPUs to effectively stream 1080p x 30 FPS over RTSP (or even 720p x 30FPS).

Looking at NVIDIA whitepaper on Turing platforms didn’t help. Our own testing so far has been not very encouraging on RTX 2080 Ti (with 1080p or even 720p)

So:

  1. Which GPU is suitable for implementing DeepLearning Inference (using DeepStream)+ RTSP streaming of 4+ and 8+ streams
  2. The performance of DeepStream (deepstream-app) doesn’t seem to be different (and continues to be poor with jitter and significant end-to-end delay as well as buffer caching) when incoming streams of lower resolution (e.g. 720p) is used. Would lowering of incoming frame resolution improve the performance?

Thanks for your inputs.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_quick_start.html#

NVIDIA® DeepStream Software Development Kit (SDK) is an accelerated AI framework to build intelligent video analytics (IVA) pipelines. DeepStream runs on NVIDIA® T4 and platforms such as NVIDIA® Jetson™ Nano, NVIDIA® Jetson AGX Xavier™, NVIDIA® Jetson Xavier NX™, NVIDIA® Jetson™ TX1 and TX2.

There is some performance data for your refference: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_performance.html#

Hi!

Thanks for the information. We had already looked into it. To be more specific:

  1. Some of the Quadro series GPUs do not restrict the number of concurrent NVENC sessions ( https://developer.nvidia.com/video-encode-decode-gpu-support-matrix ). What is the reasonable number of concurrent encoding session possible when using 1080p camera inputs at 30FPS?

  2. In case of Quadro series, what will be the expected FPS when using inputs from 30FPS, H264 1080p RTSP cameras and performing a simple pipeline using mux->nvinfer (YoloV3)->Tracker-Demux (No Tiling)?

  3. On Jetson, the example shown is for Resnet10 (which is highly tuned/pruned model and detects just 4 classes of objects). What will be the FPS throughput when YoloV3 (with 80 classes of objects)?

  4. In general, will reducing the input camera resolution to 720p or reducing the bitrate improve the FPS throughput?

There is only some encoder performance data in https://developer.nvidia.com/nvidia-video-codec-sdk, please refer to “NVENC - Hardware-Accelerated Video Encoding” part. There are some stream number data for some GPUs. No data for Quadro GPUs now.