Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline

sebastianvvj9c · November 21, 2023, 6:40pm

Description

Onnx models from the model zoo produce poor results in deepstream (low fps, stuttering output - actual annotations are good)

Hi, we’re looking to run yolov4 object detection models in deepstream. Unfortunately its not working at the min. Our process at the min is:

Download a yolov4 model from onnx model zoo GitHub - onnx/models: A collection of pre-trained, state-of-the-art models in the ONNX format
Convert it with trtexec on the target device (Jetson NX running JP4.6, DS6.0):
/usr/src/tensorrt/bin/trtexec --onnx=/data/models/yolov4_onnx.onnx --saveEngine=/data/models/yolov4_coco_dynamic_kxm.engine --explicitBatch --minShapes=input:1x3x416x416 --optShapes=input:4x3x416x416 --maxShapes=input:16x3x416x416
This works, and I can test it with:
/usr/src/tensorrt/bin/trtexec --loadEngine=/data/models/yolov4_coco_kxm.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph
All okay.
However when I come to run it in DeepStream using mp4 inputs, the output is stutters (runs for maybe 0.5-1seconds and then stops for a bit), and the fps is very low (10-15fps when the input videos are 30fps)

This is my config:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

output display details

[tiled-display]
enable=1
rows=2
columns=2
width=1920
height=1080
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

mp4 video source

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=2
gpu-id=0
cudadec-memtype=0
source-id=0
camera-width=1280
camera-height=720

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:///opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_1080p_h264.mp4
#uri=file:///home/tushar/sample_0_720p.mp4
num-sources=2
gpu-id=0
nvbuf-memory-type=0

rtsp video source

[source2]
enable=0
type=4
#latency=30000
#drop-on-latency=false
#drop-frame-interval=3
buffer-size=5000000
uri=
cudadec-memtype=0
source-id=0

rtsp video out

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=10000000
#bitrate=2700000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0

set below properties in case of RTSPStreaming

rtsp-port=8556
udp-port=5400
#source-id=0

mp4 out

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
enc-type=0
#1=h264 2=h265 3=mpeg4

only SW mpeg4 is supported right now.

codec=1
sync=1
bitrate=4000000
profile=0
output-file=/data/videos-out/21112023_093556_RowdenCarpark2.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvoverlaysink 6=MsgConvBroker
type=6
msg-conv-config=redis_msg_config.txt
#(0): PAYLOAD_DEEPSTREAM - Deepstream schema payload
#(1): PAYLOAD_DEEPSTREAM_MINIMAL - Deepstream schema payload minimal
#(256): PAYLOAD_RESERVED - Reserved type
#(257): PAYLOAD_CUSTOM - Custom schema payload
msg-conv-payload-type=0
msg-conv-msg2p-new-api=1
msg-conv-frame-interval=100
#msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_kafka_proto.so
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_redis_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=localhost;6379
#topic=deepstream_detection_messages
topic=metadata
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/redis_protocol_adaptor/cfg_redis.txt

on screen display

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

stream mux - forms batches of frames from multiple input sources

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=33333

Set muxer output width and height

width=1280
height=720
#enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

primary gpu inference engine (model)

[primary-gie]
enable=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;0;1
nvbuf-memory-type=0
config-file=detector_config.txt

[tracker]
enable=1

For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively

tracker-width=320
tracker-height=256
ll-lib-file=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so

ll-config-file required to set different tracker types

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_IOU.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_DeepSORT.yml

gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

secondary gpu inference engine (model)

[secondary-gie]
enable=0
gpu-id=0
batch-size=1

0=FP32, 1=INT8, 2=FP16 mode

nvbuf-memory-type=0
config-file=classifier_config.txt
gie-unique-id=2
operate-on-gie-id=1

[tests]
file-loop=0

And this is my detector_config.txt:
[property]
gpu-id=0
model-engine-file=/data/models/yolov4_coco_kxm.engine
batch-size=4
gie-unique-id=1
maintain-aspect-ratio=1
symmetric-padding=0
network-mode=0
process-mode=1
network-type=0
interval=4
engine-create-func-name=NvDsInferYoloCudaEngineGet
force-implicit-batch-dim=1

from models.json

net-scale-factor=0.003921569790691137
labelfile-path=/data/labels/coco.txt
num-detected-classes=80
cluster-mode=3
#parse-bbox-func-name=NvDsInferParseCustomYoloV3
#custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#infer-dims=3;544;960
#output-blob-names=BatchedNMS
#parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
#custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_tlt_apps/post_processor/libnvds_infercustomparser_tlt.so
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#parse-bbox-func-name=NvDsInferParseCustomYoloV4
#custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#custom-lib-path=/yolo_deepstream/deepstream_yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
model-color-format=0

[class-attrs-all]
topk=20
nms-iou-threshold=0.5
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

from models.json

pre-cluster-threshold=0.7

Thanks in advance!

Environment

TensorRT Version:
v8.0.1
GPU Type:
Jetson NX
Nvidia Driver Version:
CUDA Version:
10.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

AakankshaS · November 21, 2023, 7:58pm

Moving this to Deepstream Forum
Thanks

fanzh · November 22, 2023, 6:24am

please refer to topic for performance analysis. please refer to topic for fps checking.
to narrow down this issue, please set fakesink to check if the output can get high fps.

sebastianvvj9c · November 28, 2023, 12:12pm

I’ve trialled the fps improvements:

Using enable-perf-measurement=1
**PERF: 10.45 (10.44) 10.45 (10.46) 10.45 (10.46) 10.45 (10.46)

Using export NVDS_ENABLE_LATENCY_MEASUREMENT=1, I see:
BATCH-NUM = 0**
Batch meta not found for buffer 0x7f24121c30
BATCH-NUM = 1**
Batch meta not found for buffer 0x7f1401ac10

I’m setting batched-push-timeout to 1/max_fps (33333)
Height and width in streammux are set to the input video’s height and width

Looking at jtop, the GPU usage appears to sit at >99% the majority of the time, and drops down once every few seconds

Setting qos=0 in sink0 appears to make no difference

One more thing. The bounding boxes don’t also print out object class and ID, which they do when I’m using the ultralytics yolov8 with batch size of 1 - how can I get this working?

sebastianvvj9c · November 28, 2023, 12:16pm

Using just a fakesink block nothing seems to change but the latency measurement gives accurate results:

BATCH-NUM = 133**
Source id = 0 Frame_num = 133 Frame latency = 1701173680300.191895 (ms)
Source id = 3 Frame_num = 133 Frame latency = 1891.279785 (ms)
Source id = 2 Frame_num = 133 Frame latency = 1888.423828 (ms)
Source id = 1 Frame_num = 133 Frame latency = 1894.659912 (ms)

BATCH-NUM = 134**
Source id = 3 Frame_num = 134 Frame latency = 1701173680308.073975 (ms)
Source id = 2 Frame_num = 134 Frame latency = 1890.655029 (ms)
Source id = 0 Frame_num = 134 Frame latency = 1884.128906 (ms)
Source id = 1 Frame_num = 134 Frame latency = 1886.737061 (ms)
**PERF: 10.54 (10.70) 10.54 (10.70) 10.54 (10.70) 10.54 (10.70)

sebastianvvj9c · November 28, 2023, 12:18pm

Here’s the full component latency measurements:
BATCH-NUM = 34**
Comp name = nvosd0 in_system_timestamp = 1701173818236.645020 out_system_timestamp = 1701173818238.068115 component latency= 1.423096
Comp name = osd_conv in_system_timestamp = 1701173818232.974121 out_system_timestamp = 1701173818236.451904 component latency= 3.477783
Comp name = tiled_display_tiler in_system_timestamp = 1701173818224.996094 out_system_timestamp = 1701173818231.325928 component latency= 6.329834
Comp name = tracking_tracker in_system_timestamp = 1701173818201.530029 out_system_timestamp = 1701173818217.559082 component latency= 16.029053
Comp name = primary_gie in_system_timestamp = 1701173818200.653076 out_system_timestamp = 1701173818201.501953 component latency= 0.848877
Comp name = nvstreammux-src_bin_muxer source_id = 3 pad_index = 3 frame_num = 34 in_system_timestamp = 1701173817771.239990 out_system_timestamp = 1701173818200.542969 component_latency = 429.302979
Comp name = nvv4l2decoder3 in_system_timestamp = 1701173816769.830078 out_system_timestamp = 1701173817739.690918 component latency= 969.860840
Comp name = nvstreammux-src_bin_muxer source_id = 2 pad_index = 2 frame_num = 33 in_system_timestamp = 1701173817767.345947 out_system_timestamp = 1701173818200.541992 component_latency = 433.196045
Comp name = nvv4l2decoder1 in_system_timestamp = 1701173816768.810059 out_system_timestamp = 1701173817738.277100 component latency= 969.467041
Comp name = nvstreammux-src_bin_muxer source_id = 1 pad_index = 1 frame_num = 33 in_system_timestamp = 1701173817772.541992 out_system_timestamp = 1701173818200.541992 component_latency = 428.000000
Comp name = nvv4l2decoder2 in_system_timestamp = 1701173816768.469971 out_system_timestamp = 1701173817736.697998 component latency= 968.228027
Comp name = nvstreammux-src_bin_muxer source_id = 0 pad_index = 0 frame_num = 33 in_system_timestamp = 1701173817769.280029 out_system_timestamp = 1701173818200.541016 component_latency = 431.260986
Comp name = nvv4l2decoder0 in_system_timestamp = 1701173816767.169922 out_system_timestamp = 1701173817735.011963 component latency= 967.842041
Source id = 3 Frame_num = 34 Frame latency = 1701173818238.372070 (ms)
Source id = 2 Frame_num = 33 Frame latency = 1468.541992 (ms)
Source id = 1 Frame_num = 33 Frame latency = 1469.562012 (ms)
Source id = 0 Frame_num = 33 Frame latency = 1469.902100 (ms)

It looks like the PGIE block is the issue

fanzh · November 28, 2023, 2:36pm

it is performance issue. please execute the following command-line to improve performance. please refer to the doc.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

could you share the result of “ll /data/labels/coco.txt” and the label file coco.txt?

sebastianvvj9c · November 28, 2023, 2:47pm

I’ve done the nvpmodel and jetson_clocks commands now and retested but I’m getting the same result (10-15fps)

I did sudo chmod 777 -R /data/labels/coco.txt
-rwxrwxrwx 1 1000 1000 621 Nov 24 16:59 /data/labels/coco.txt*
and reran and I got the same result (no bounding box labels)

fanzh · November 28, 2023, 2:56pm

could you share the label file coco.txt? maybe nvinfer failed to parse that file.

sebastianvvj9c · November 28, 2023, 2:58pm

labels.txt (621 Bytes)

I’m actually having the same issue with the sample resnet10 model using /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/labels.txt and /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel
But it works fine with ultralytics yolov8 using this same label file

fanzh · November 28, 2023, 3:15pm

labels.txt is fine. can you provide the whole project(model, cfg), I will have a try. and nvinfer and low level lib are opensource. you can add log to check if interested.

sebastianvvj9c · November 28, 2023, 3:26pm

Thanks, the configs are pasted above, the model is here https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/yolov4/model/yolov4.onnx

fanzh · November 29, 2023, 9:10am

could you share libnvdsinfer_custom_impl_Yolo.so which includes NvDsInferParseCustomYoloV4? you can use forum private email. please click forum avatar-> email.

sebastianvvj9c · November 29, 2023, 9:13am

I’m using NvDsInferParseYolo from Deepstream-Yolo (GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models). I’ll email you

sebastianvvj9c · November 29, 2023, 9:16am

I can’t see an email for you. I’ve built libnvdsinfer_custom_impl_Yolo.so using the instructions here for DS6.0/CUDA10.2

fanzh · November 29, 2023, 12:33pm

testing yolov4.onnx model you shared in DeepStream-Yolo project, I can’t get bboxes. please help to check if processing parameters are correct.
config_infer_primary_yoloV4.txt (1.2 KB)
labels.txt (621 Bytes)

sebastianvvj9c · November 29, 2023, 4:12pm

try net-scale-factor=0.003921569790691137?

Nonetheless I think onnx models converted from tf wont work without a custom parser/conversion with the NCHW format because of this How to resolve the error: RGB/BGR input format specified but network input channels is not 3?

So I’ll close this topic and open one for the labelled bounding boxes issue

system · December 13, 2023, 4:13pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loss of precision to onnx converter for engine by deepstream 6.3 DeepStream SDK tensorrt , gstreamer , inference-server-triton	31	797	August 2, 2024
Classifier result on onnx doesn't match Deepstream result DeepStream SDK tensorrt , tensorflow , nvbugs , onnx	35	3733	October 2, 2021
Using custom model in deepstream DeepStream SDK jetson-inference , python , deepstream	40	586	September 10, 2024
Get wrong infer results while testing yolov4 on deepstream 5.0 DeepStream SDK	46	9905	October 12, 2021
Loading Yolov5 model into DeepStream DeepStream SDK	21	2468	August 3, 2023
TensorFlow EfficientDet-D0 -> ONNX -> TensorRT converted model fails to run in Deepstream DeepStream SDK deepstream61	8	1120	August 11, 2022
Yolov7 onnx model convert to trt model error in deepstream 6.2, but use trtexec ok DeepStream SDK	9	445	June 11, 2024
Issue with Converting ONNX Model with different dimensions to TensorRT Engine for DeepStream DeepStream SDK deepstream	22	650	May 23, 2025
Detections change in Deepstream 6.2 DeepStream SDK tensorrt , camera , cuda , kernel , gstreamer , yolo , python , deepstream , deepstream61	21	1980	July 31, 2023
Yolov7 Deepstream application won't start DeepStream SDK	14	974	December 20, 2022

Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline

Description

output display details

mp4 video source

rtsp video source

rtsp video out

set below properties in case of RTSPStreaming

mp4 out

only SW mpeg4 is supported right now.

on screen display

stream mux - forms batches of frames from multiple input sources

Set muxer output width and height

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

primary gpu inference engine (model)

For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively

ll-config-file required to set different tracker types

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_IOU.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_DeepSORT.yml

secondary gpu inference engine (model)

0=FP32, 1=INT8, 2=FP16 mode

from models.json

from models.json

Environment

Relevant Files

Steps To Reproduce

Related topics