Decoder Latency increases when enabling DLA

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) jetson agx orin
• DeepStream Version 6.2
• JetPack Version (valid for Jetson only) jp 5.1
• TensorRT Version 8.5.2

I have a pipeline which ustlizes DS plugins as follow
Decoder → nvstreammux → detect → classify → encode
i have noticed when running the detect/classify models on DLA, decoding time only increases to almost double compared with running the same models on GPU. Is there any reason for this? and additionally can we run the model without using nvstreammux in deepstream as it introduce latency to the piepline.
The ENCODE-DECODE uses NVDEC

How did you measure the “decoding time”?

No.

  1. The nvstreammux is the key of inferencing since it generate the batched data for TensorRT.
  2. You may need to set the “width” and “height” properties of nvstreammux to the same as the video’s original resolution, then the nvstreammux latency will be very small.

Thank you for the clarification.

I have enabled complement latency measurements by :
$ export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
And please note that some layers of the model fell back to the gpu

This method measures the latency of the GstBuffer going into the sink pad and going out from the src pad, it is not a latency of the processing time. Sometimes the downstream element consumes the buffer late then the latency of the upstream element becomes long. It does not mean the upstream element processes the buffer slowly.

Please measure the GPU/DLA loading with the pipeline running to find the real bottleneck of the pipeline.

can you please clarify this point because as per documentation that this command will give the latency to each plugin in the pipeline and i have decode as one of the used plug-ins. The command gave an output of time in ms for each used plug-in the pipeline for each Gstbuffer “Frame in my case”

please advise how can i measure the loading

As described in the document, the latency to each plugin in the pipeline can be measured in this way. Your concern about the “decoding time only increases to almost double compared with running the same models on GPU.” may not be the performance drop of the decoder, the inferencing speed drop may also cause the decoding plugin latency be larger.

The easiest way is to use the “Jetson Power GUI”. The command “tegrastats” may also help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.