Support in Performance testing in Nvidia Tesla T4 card

Hi Nvidia team,
We have purchased the Nvidia Tesla T4 GPU card. We need to check the performance of the card in Video Analytics. The Details of the complete setup is given below.

• Hardware Platform ( GPU) Nvidia Tesla T4 GPU card.
• DeepStream Version: 5.0
• TensorRT Version 5.1.1
• NVIDIA GPU Driver Version 418+

Which are the tools or command line operations that are available for the detailed performance check of the T4 card. We are using gstreamer plugins for checking the performances in various aspects. We need to validate/check the following parameters.

  • Throughput
  • Latency for each plugins used and End to End latenct of complete pipeline.
  • GPU Utilization and Power (I think nvidia-smi is an option)
  • Maximum Input frames support.
2 Likes

Throughput ==> this is based on what you will have in your pipeline.

Latency for each plugins used and End to End latenct of complete pipeline. ==> you can refer to “The DeepStream application is running slowly.” in https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_troubleshooting.html

GPU Utilization and Power (I think nvidia-smi is an option) ==> yes, nvidia-smi

Maximum Input frames support. ==> this depends on input source

Hi, we have an open-source product that provides tracers and profiling tools for GStreamer (GstShark). You can use it specifically to measure framerate and latency on each element of your pipeline.

1 Like

Thanks, @miguel.taylor!

And, on T4, to roughly evaluate the throughput , you can check how many throughput each component, e.g. decoding, inference, can run and evaluate the throughput of total pipeline.

For decoding perf, you can find in https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_performance.html#wwpID0E0JB0HA
For inference perf, you can test with https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps#measure-the-inference-perf

Hi
Thanks for the reply. We are using gst-shark for checking the latency, bitrate and processing time for the Nvidia Tesla T4 card.
We have installed gst-shark in our system and run the Video Analytics gstreamer pipeline. A sample pipeline structure is given below.
Input Source --> nvstreammux --> nvvideoconvert --> nvinfer1 --> nvtracker --> nvinfer2 -->nvosd.
But **we are not getting proper logs or response from gst-shark.**The output logsfile.txt (789 Bytes) are atttached.
How can we use gst-shark to measure the performance parameters on the T4 card. Can you provide a sample pipeline based on the pipeline structure that I had mentioned above?

Hi @beunycmathews
gst-shark is not from NVIDIA and it’s a common tool for GStreamer. So, please consult the owner of gst-shark about gst-shark questions.

Can you provide a sample pipeline based on the pipeline structure that I had mentioned above?

Please refer to “apps/sample_apps/deepstream-test2” which is a simple example of how to use DeepStream elements for a single H.264 stream: filesrc→ decode→ nvstreammux→ nvinfer (primary detector)→ nvtracker→ nvinfer (secondary classifier)→ nvdsosd → renderer.

Hi

I mainly use gst-shark with gst-launch, but a pipeline with 2 nvinfer will fail with gst-launch because the unique-id needs to be set with the pipeline on pause, and gst-launch jumps directly from creating the pipeline to playing. Considering this, I will share a pipeline with gst-launch and only one nvinfer (in case you don’t want to install GstD) and then other pipelines using GStreamer Daemon to get 2 nvinfer in the pipeline. I will use the paths to my default DeepStream-4.0 installation, just change the base of the paths to use your installation.

Interlatency

videotestsrc -> nvstreammux -> nvvideoconvert -> nvinfer -> nvtracker -> nvosd -> nvoverlaysink

GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency" gst-launch-1.0 \
videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1" ! queue ! \
nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 batched-push-timeout=40000 width=1280 height=720 live-source=TRUE ! queue ! \
nvvideoconvert ! queue ! \
nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_primary.txt" ! queue ! \
nvtracker tracker-width=240 tracker-height=200 ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so ll-config-file=/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/iou_config.txt ! queue ! \
nvdsosd process-mode=HW_MODE ! queue ! \
fakesink sync=false

output (TX2):

0:00:08.285432464 11930   0x559891e400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert1_src, time=(string)0:00:00.039601327;
0:00:08.285570732 11930   0x559891e400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue1_src, time=(string)0:00:00.031370939;
0:00:08.288424474 11930   0x5598ced1e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert0_src, time=(string)0:00:00.008115991;
0:00:08.288514071 11930   0x5598ced1e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)capsfilter0_src, time=(string)0:00:00.008230324;
0:00:08.288622420 11930   0x559891e450 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue0_src, time=(string)0:00:00.008327185;
0:00:08.288900716 11930   0x7f140040a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvstreammux0_src, time=(string)0:00:00.008611753;
0:00:08.292366281 11930   0x5598ced140 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer0_src, time=(string)0:00:00.065123729;
0:00:08.292572259 11930   0x559891e720 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue3_src, time=(string)0:00:00.065331019;
0:00:08.292614274 11930   0x559891dad0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue2_src, time=(string)0:00:00.056311407;
0:00:08.292945208 11930   0x559891e540 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvtracker0_src, time=(string)0:00:00.065702305;
0:00:08.293177490 11930   0x559891e590 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue4_src, time=(string)0:00:00.065932762;
0:00:08.293310766 11930   0x559891e590 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvdsosd0_src, time=(string)0:00:00.066095349;
0:00:08.293408075 11930   0x559891e680 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue5_src, time=(string)0:00:00.066187571;
0:00:08.293470505 11930   0x559891e680 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.066187571;

To measure other tracers just change GST_TRACERS in the pipeline. Some other tracer that might be useful for you:

Processing time: GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 \
Framerate: GST_DEBUG="GST_TRACER:7" GST_TRACERS="framerate" gst-launch-1.0 \
CPU usage: GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 \
Bitrate: GST_DEBUG="GST_TRACER:7" GST_TRACERS="bitrate" gst-launch-1.0 \

You can check the wiki for the documentation on all the tracers available as well as how to use the graphic tools:

Now for the pipelines with 2 or more nvinfer using GstD:

  • Open 2 terminals
  • In the first terminal launch GstD as a daemon with the tracers:
GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency" gstd -D 
  • In the second terminal create the pipeline
gstd-client pipeline_create p0 \
videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1" ! queue ! \
nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 batched-push-timeout=40000 width=1920 height=1080 live-source=TRUE ! queue ! \
nvvideoconvert ! queue ! \
nvinfer name=nvinfer1 config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_primary.txt" ! queue ! \
nvtracker tracker-width=640 tracker-height=368 ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so ll-config-file=/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/tracker_config.yml enable-batch-process=1 ! queue ! \
nvinfer name=nvinfer2 process-mode=secondary infer-on-gie-id=1 infer-on-class-ids="0:" batch-size=16 config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_secondary_carcolor.txt" ! queue ! \
nvdsosd process-mode=HW_MODE ! queue ! \
fakesink sync=false
  • Set the unique IDs
gstd-client element_set p0 nvinfer1 unique-id 1
gstd-client element_set p0 nvinfer2 unique-id 2
  • Play the pipeline
gstd-client pipeline_play p0
  • Output (in the first terminal)
0:04:45.176374519 12079   0x7fa44b5e80 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert1_src, time=(string)0:00:00.070470749;
0:04:45.176488853 12079   0x7fa44b5e80 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue1_src, time=(string)0:00:00.061132200;
0:04:45.176952816 12079   0x7fa44b5ed0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue0_src, time=(string)0:00:00.033056424;
0:04:45.177282380 12079   0x7ecc0038a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvstreammux0_src, time=(string)0:00:00.033277477;
0:04:45.180339114 12079   0x7fa44b6280 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert0_src, time=(string)0:00:00.009281654;
0:04:45.180463016 12079   0x7fa44b6280 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)capsfilter0_src, time=(string)0:00:00.009765969;
0:04:45.182649935 12079   0x7fa44b6140 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer1_src, time=(string)0:00:00.095506593;
0:04:45.182812237 12079   0x558c84bd90 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue2_src, time=(string)0:00:00.085548818;
0:04:45.182814317 12079   0x7fa44b61e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue3_src, time=(string)0:00:00.095637920;
0:04:45.183806658 12079   0x7fa44b6230 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvtracker0_src, time=(string)0:00:00.096674068;
0:04:45.183885697 12079   0x7fa44b60f0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue4_src, time=(string)0:00:00.096768947;
0:04:45.183953504 12079   0x7fa44b60f0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer2_src, time=(string)0:00:00.096840754;
0:04:45.184064063 12079   0x7fa44b5f70 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue5_src, time=(string)0:00:00.096942929;
0:04:45.184150942 12079   0x7fa44b5f70 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvdsosd0_src, time=(string)0:00:00.097017488;
0:04:45.184234301 12079   0x7fa44b5f20 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue6_src, time=(string)0:00:00.097105679;
0:04:45.184277213 12079   0x7fa44b5f20 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.097105679;
2 Likes