Support in Performance testing in Nvidia Tesla T4 card

beunycmathews · July 10, 2020, 7:33am

Hi Nvidia team,
We have purchased the Nvidia Tesla T4 GPU card. We need to check the performance of the card in Video Analytics. The Details of the complete setup is given below.

• Hardware Platform ( GPU) Nvidia Tesla T4 GPU card.
• DeepStream Version: 5.0
• TensorRT Version 5.1.1
• NVIDIA GPU Driver Version 418+

Which are the tools or command line operations that are available for the detailed performance check of the T4 card. We are using gstreamer plugins for checking the performances in various aspects. We need to validate/check the following parameters.

Throughput
Latency for each plugins used and End to End latenct of complete pipeline.
GPU Utilization and Power (I think nvidia-smi is an option)
Maximum Input frames support.

mchi · July 10, 2020, 3:55pm

Throughput ==> this is based on what you will have in your pipeline.

Latency for each plugins used and End to End latenct of complete pipeline. ==> you can refer to “The DeepStream application is running slowly.” in NVIDIA DeepStream SDK Developer Guide — DeepStream 6.1.1 Release documentation

GPU Utilization and Power (I think nvidia-smi is an option) ==> yes, nvidia-smi

Maximum Input frames support. ==> this depends on input source

miguel.taylor · July 10, 2020, 11:31pm

Hi, we have an open-source product that provides tracers and profiling tools for GStreamer (GstShark). You can use it specifically to measure framerate and latency on each element of your pipeline.

mchi · July 12, 2020, 2:56pm

Thanks, @miguel.taylor!

And, on T4, to roughly evaluate the throughput , you can check how many throughput each component, e.g. decoding, inference, can run and evaluate the throughput of total pipeline.

For decoding perf, you can find in https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_performance.html#wwpID0E0JB0HA
For inference perf, you can test with GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream

beunycmathews · July 20, 2020, 10:24am

Hi
Thanks for the reply. We are using gst-shark for checking the latency, bitrate and processing time for the Nvidia Tesla T4 card.
We have installed gst-shark in our system and run the Video Analytics gstreamer pipeline. A sample pipeline structure is given below.
Input Source → nvstreammux → nvvideoconvert → nvinfer1 → nvtracker → nvinfer2 -->nvosd.
But **we are not getting proper logs or response from gst-shark.**The output logsfile.txt (789 Bytes) are atttached.
How can we use gst-shark to measure the performance parameters on the T4 card. Can you provide a sample pipeline based on the pipeline structure that I had mentioned above?

mchi · July 20, 2020, 10:52am

Hi @beunycmathews
gst-shark is not from NVIDIA and it’s a common tool for GStreamer. So, please consult the owner of gst-shark about gst-shark questions.

Can you provide a sample pipeline based on the pipeline structure that I had mentioned above?

Please refer to “apps/sample_apps/deepstream-test2” which is a simple example of how to use DeepStream elements for a single H.264 stream: filesrc→ decode→ nvstreammux→ nvinfer (primary detector)→ nvtracker→ nvinfer (secondary classifier)→ nvdsosd → renderer.

miguel.taylor · July 20, 2020, 4:25pm

Hi

I mainly use gst-shark with gst-launch, but a pipeline with 2 nvinfer will fail with gst-launch because the unique-id needs to be set with the pipeline on pause, and gst-launch jumps directly from creating the pipeline to playing. Considering this, I will share a pipeline with gst-launch and only one nvinfer (in case you don’t want to install GstD) and then other pipelines using GStreamer Daemon to get 2 nvinfer in the pipeline. I will use the paths to my default DeepStream-4.0 installation, just change the base of the paths to use your installation.

Interlatency

videotestsrc → nvstreammux → nvvideoconvert → nvinfer → nvtracker → nvosd → nvoverlaysink

GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency" gst-launch-1.0 \
videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1" ! queue ! \
nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 batched-push-timeout=40000 width=1280 height=720 live-source=TRUE ! queue ! \
nvvideoconvert ! queue ! \
nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_primary.txt" ! queue ! \
nvtracker tracker-width=240 tracker-height=200 ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so ll-config-file=/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/iou_config.txt ! queue ! \
nvdsosd process-mode=HW_MODE ! queue ! \
fakesink sync=false

output (TX2):

0:00:08.285432464 11930   0x559891e400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert1_src, time=(string)0:00:00.039601327;
0:00:08.285570732 11930   0x559891e400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue1_src, time=(string)0:00:00.031370939;
0:00:08.288424474 11930   0x5598ced1e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert0_src, time=(string)0:00:00.008115991;
0:00:08.288514071 11930   0x5598ced1e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)capsfilter0_src, time=(string)0:00:00.008230324;
0:00:08.288622420 11930   0x559891e450 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue0_src, time=(string)0:00:00.008327185;
0:00:08.288900716 11930   0x7f140040a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvstreammux0_src, time=(string)0:00:00.008611753;
0:00:08.292366281 11930   0x5598ced140 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer0_src, time=(string)0:00:00.065123729;
0:00:08.292572259 11930   0x559891e720 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue3_src, time=(string)0:00:00.065331019;
0:00:08.292614274 11930   0x559891dad0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue2_src, time=(string)0:00:00.056311407;
0:00:08.292945208 11930   0x559891e540 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvtracker0_src, time=(string)0:00:00.065702305;
0:00:08.293177490 11930   0x559891e590 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue4_src, time=(string)0:00:00.065932762;
0:00:08.293310766 11930   0x559891e590 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvdsosd0_src, time=(string)0:00:00.066095349;
0:00:08.293408075 11930   0x559891e680 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue5_src, time=(string)0:00:00.066187571;
0:00:08.293470505 11930   0x559891e680 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.066187571;

To measure other tracers just change GST_TRACERS in the pipeline. Some other tracer that might be useful for you:

Processing time: GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 \
Framerate: GST_DEBUG="GST_TRACER:7" GST_TRACERS="framerate" gst-launch-1.0 \
CPU usage: GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 \
Bitrate: GST_DEBUG="GST_TRACER:7" GST_TRACERS="bitrate" gst-launch-1.0 \

You can check the wiki for the documentation on all the tracers available as well as how to use the graphic tools:

Now for the pipelines with 2 or more nvinfer using GstD:

Open 2 terminals
In the first terminal launch GstD as a daemon with the tracers:

GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency" gstd -D

In the second terminal create the pipeline

gstd-client pipeline_create p0 \
videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1" ! queue ! \
nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 batched-push-timeout=40000 width=1920 height=1080 live-source=TRUE ! queue ! \
nvvideoconvert ! queue ! \
nvinfer name=nvinfer1 config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_primary.txt" ! queue ! \
nvtracker tracker-width=640 tracker-height=368 ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so ll-config-file=/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/tracker_config.yml enable-batch-process=1 ! queue ! \
nvinfer name=nvinfer2 process-mode=secondary infer-on-gie-id=1 infer-on-class-ids="0:" batch-size=16 config-file-path="/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/config_infer_secondary_carcolor.txt" ! queue ! \
nvdsosd process-mode=HW_MODE ! queue ! \
fakesink sync=false

Set the unique IDs

gstd-client element_set p0 nvinfer1 unique-id 1
gstd-client element_set p0 nvinfer2 unique-id 2

Play the pipeline

gstd-client pipeline_play p0

Output (in the first terminal)

0:04:45.176374519 12079   0x7fa44b5e80 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert1_src, time=(string)0:00:00.070470749;
0:04:45.176488853 12079   0x7fa44b5e80 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue1_src, time=(string)0:00:00.061132200;
0:04:45.176952816 12079   0x7fa44b5ed0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue0_src, time=(string)0:00:00.033056424;
0:04:45.177282380 12079   0x7ecc0038a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvstreammux0_src, time=(string)0:00:00.033277477;
0:04:45.180339114 12079   0x7fa44b6280 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvvideoconvert0_src, time=(string)0:00:00.009281654;
0:04:45.180463016 12079   0x7fa44b6280 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)capsfilter0_src, time=(string)0:00:00.009765969;
0:04:45.182649935 12079   0x7fa44b6140 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer1_src, time=(string)0:00:00.095506593;
0:04:45.182812237 12079   0x558c84bd90 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue2_src, time=(string)0:00:00.085548818;
0:04:45.182814317 12079   0x7fa44b61e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue3_src, time=(string)0:00:00.095637920;
0:04:45.183806658 12079   0x7fa44b6230 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvtracker0_src, time=(string)0:00:00.096674068;
0:04:45.183885697 12079   0x7fa44b60f0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue4_src, time=(string)0:00:00.096768947;
0:04:45.183953504 12079   0x7fa44b60f0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvinfer2_src, time=(string)0:00:00.096840754;
0:04:45.184064063 12079   0x7fa44b5f70 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue5_src, time=(string)0:00:00.096942929;
0:04:45.184150942 12079   0x7fa44b5f70 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)nvdsosd0_src, time=(string)0:00:00.097017488;
0:04:45.184234301 12079   0x7fa44b5f20 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)queue6_src, time=(string)0:00:00.097105679;
0:04:45.184277213 12079   0x7fa44b5f20 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)videotestsrc0_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.097105679;

Topic		Replies	Views
Interval has no impact on the execution time DeepStream SDK	13	560	November 18, 2023
Slow decoding T4 DS 5.0.1 DeepStream SDK	11	595	October 12, 2021
Enable Perf measurement(FPS) for deepstream-test3-app not working DeepStream SDK	5	783	June 27, 2023
An understanding of the delay result produced by latency_measurement_buf_probe DeepStream SDK camera , cudnn , deepstream	40	361	December 24, 2024
How to improve the performance of pipelines in deepstream DeepStream SDK deepstream	26	302	October 21, 2025
The most efficient method to evaluate time each plugin (in DeepStream)cost? DeepStream SDK	8	1393	October 12, 2021
Enable Perf measurement(FPS) for deepstream sample apps such as deepstream-test2 DeepStream SDK	4	1616	October 12, 2021
How to messure actual nvv4l2decoder0 latancy? DeepStream SDK	6	279	July 13, 2023
How to check inference time for a frame when using deepstream DeepStream SDK deepstream	9	207	September 5, 2024
How to use enable_perf_measurement=1 using dockerd DeepStream SDK deepstream	13	170	December 30, 2024

Support in Performance testing in Nvidia Tesla T4 card

Related topics