Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX

sebastianvvj9c · November 21, 2023, 6:09pm

• Hardware Platform (Jetson / GPU)
Jetson NX
• DeepStream Version
6.0
• JetPack Version (valid for Jetson only)
4.6.
• TensorRT Version
8.0.1
• Issue Type( questions, new requirements, bugs)
Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Hi, we’re looking to set up a pipeline for model training and deployment from Nvidia TAO to DeepStream using TAO (preferably version 5.0) and DeepStream 6.0 (fixed). I’m having issues getting results in Deepstream at the minute though:

Following the tutorial to integrate Yolov4 from Nvidia TAO’s default collection Deploying to DeepStream for YOLOv4 - NVIDIA Docs
I don’t see any annotation output on the video
I’ve downloaded the tao 3.0 Yolov4 model from TAO Toolkit Integration with DeepStream — DeepStream 6.3 Release documentation
and generated the engine with:
tao-converter_vv3.21.11_trt8.0_aarch64/tao-converter -k nvidia_tlt -t fp16 -p Input,1x3x544x960,4x3x544x960,16x3x544x960 -e /data/models/yolov4_resnet18_default.engine -m 4 …/yolov4_resnet18_395.etlt -d 3,544,960
I’ve used these as the labels https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/nvinfer/yolov4_tao/yolov4_labels.txt

Am i missing anything?

Also in general is there any differences in the process to deploy from TAO v5.0? As we’ll be using that in the future for yolov4 models

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Preferably some tutorial and downloadable yolov4 model

fanzh · November 22, 2023, 6:01am

please refer to this TAO yolov4 sample.

sebastianvvj9c · November 22, 2023, 11:13am

It runs. Although when I test the .engine with batch size of >1:
/usr/src/tensorrt/bin/trtexec --loadEngine=/data/models/yolov4_resnet18_default.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph
I get:
Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueue::276, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 4, but engine max batch size was: 1

When running through DeepStream it works but I get the same low FPS result as in Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline

I’ll continue to investigate

fanzh · November 27, 2023, 2:35pm

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

sebastianvvj9c · November 28, 2023, 11:20am

I’ve trialled the fps improvements:

Using enable-perf-measurement=1
it looks ~14fps (30fps input video)

Using export NVDS_ENABLE_LATENCY_MEASUREMENT=1, we see:
PERF: 13.80 (13.92) 13.80 (13.88) 13.80 (13.92) 13.80 (13.92)
BATCH-NUM = 2
Batch meta not found for buffer 0x7eb40097c0
BATCH-NUM = 3**
Batch meta not found for buffer 0x7ec4048b40

I’m setting batched-push-timeout to 1/max_fps (33333)
Height and width in streammux are set to the input video’s height and width

Looking at jtop, the GPU usage appears to sit at >99% the majority of the time, and drops down once every few seconds

Setting qos=0 in sink0 appears to make no difference

I can now also bounding boxes printed on an unrelated part of the screen - not sure if this is related?

sebastianvvj9c · November 28, 2023, 11:30am

For a bit more info this is my config file:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

output display details

[tiled-display]
enable=1
rows=2
columns=2
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

mp4 video source

[source0]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=0
camera-width=1280
camera-height=720

mp4 video source

[source1]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=1
camera-width=1280
camera-height=720

mp4 video source

[source2]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=2
camera-width=1280
camera-height=720

mp4 video source

[source3]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=3
camera-width=1280
camera-height=720

rtsp video out

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=2700000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0

set below properties in case of RTSPStreaming

rtsp-port=8556
udp-port=5400
#source-id=0
qos=0

mp4 out

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
enc-type=0
#1=h264 2=h265 3=mpeg4

only SW mpeg4 is supported right now.

codec=1
sync=1
bitrate=4000000
profile=0
output-file=/data/videos-out/28112023_110616.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvoverlaysink 6=MsgConvBroker
type=6
msg-conv-config=redis_msg_config.txt
#(0): PAYLOAD_DEEPSTREAM - Deepstream schema payload
#(1): PAYLOAD_DEEPSTREAM_MINIMAL - Deepstream schema payload minimal
#(256): PAYLOAD_RESERVED - Reserved type
#(257): PAYLOAD_CUSTOM - Custom schema payload
msg-conv-payload-type=0
msg-conv-msg2p-new-api=1
msg-conv-frame-interval=100
#msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_kafka_proto.so
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_redis_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=localhost;6379
#topic=deepstream_detection_messages
topic=metadata
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/redis_protocol_adaptor/cfg_redis.txt

on screen display

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

stream mux - forms batches of frames from multiple input sources

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=33333

Set muxer output width and height

width=1280
height=720
#enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

primary gpu inference engine (model)

[primary-gie]
enable=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;0;1
nvbuf-memory-type=0
config-file=detector_config.txt

[tracker]
enable=1

For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively

tracker-width=320
tracker-height=256
ll-lib-file=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so

ll-config-file required to set different tracker types

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_IOU.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_DeepSORT.yml

gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

secondary gpu inference engine (model)

[secondary-gie]
enable=0
gpu-id=0
batch-size=4

0=FP32, 1=INT8, 2=FP16 mode

nvbuf-memory-type=0
config-file=classifier_config.txt
gie-unique-id=2
operate-on-gie-id=1

[tests]
file-loop=0

and this is my detector_config.txt
[property]
gpu-id=0

custom-network-config=/data/models/DARKNET_CFG

model-file=/data/models/yolov4_resnet18_395.etlt

proto-file=PROTO_FILEPATH

onnx-file=/data/models/yolov4_tao_default.onnx
model-engine-file=/data/models/yolov4_resnet18_395.etlt_b4_gpu0_int8.engine
batch-size=4
gie-unique-id=1
maintain-aspect-ratio=0
symmetric-padding=0
network-mode=2
process-mode=1
network-type=0
interval=4
engine-create-func-name=NvDsInferYoloCudaEngineGet
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
int8-calib-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/cal_trt.bin

from models.json

net-scale-factor=1.0
labelfile-path=/data/labels/tao_default.txt
num-detected-classes=4
cluster-mode=3
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_tlt_apps/post_processor/libnvds_infercustomparser_tlt.so

model-color-format=MODEL_COLOR_FORMAT

[class-attrs-all]
topk=20
nms-iou-threshold=0.5
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

from models.json

pre-cluster-threshold=0.25

I’m running in docker using the nvcr.io/nvidia/deepstream-l4t:6.0-samples image on a Jetson NX (rev32.6.1)
and the setup of the deepstream_tlt_apps library in the dockerfile looks like this:

WORKDIR /opt/nvidia/deepstream/deepstream-6.0/sources
RUN git clone -b release/tlt3.0 GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
WORKDIR /opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_tlt_apps
RUN /bin/sh -c “make CUDA_VER=10.2”

fanzh · November 28, 2023, 2:43pm

please refer to topic for performance improvement.

model accuracy is bad. please check if nvinfer’s configuration is right. please refer to parameters explanation doc.

sebastianvvj9c · November 28, 2023, 2:56pm

Even with the changes I’m still having the same performance issues as here Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline - #11 by sebastianvvj9c

Thanks for the info on the nvinfer configuration - I can sort that

fanzh · December 5, 2023, 3:22am

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

sebastianvvj9c · December 5, 2023, 10:07am

No, sorted now thanks. It was a GPU load issue given the model size

system · December 19, 2023, 10:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao Deploying to DeepStream for YOLOv4-tiny TAO Toolkit	6	716	August 25, 2023
Deepstream object detection TAO Toolkit	8	504	September 16, 2022
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1219	August 18, 2023
Deploy yolo_v4 to Deepstream 7.0 DeepStream SDK jetson , deepstream	25	96	March 1, 2025
Output in deepstream wrong and difference compare to infer by TAO- Yolov4 Tiny TAO Toolkit	4	281	February 23, 2024
Fail to run deepstream with own Yolov4 model TAO Toolkit docker , yolo , tao , deepstream	5	822	April 11, 2022
Custom Yolov4 model to run on deepstream + TensorRT TAO Toolkit tensorrt	4	473	July 8, 2022
Deploying Custom Trained Yolov4 model on Deepstream 6.2 sdk DeepStream SDK	21	1009	March 17, 2023
TAO Toolkit Yolov4 DeepStream SDK	20	534	November 30, 2022
Yolo_V4 deeploying to deepstream TAO Toolkit tao , jetson , deepstream	2	36	February 14, 2025

Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX

output display details

mp4 video source

mp4 video source

mp4 video source

mp4 video source

rtsp video out

set below properties in case of RTSPStreaming

mp4 out

only SW mpeg4 is supported right now.

on screen display

stream mux - forms batches of frames from multiple input sources

Set muxer output width and height

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

primary gpu inference engine (model)

For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively

ll-config-file required to set different tracker types

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_IOU.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_DeepSORT.yml

secondary gpu inference engine (model)

0=FP32, 1=INT8, 2=FP16 mode

custom-network-config=/data/models/DARKNET_CFG

proto-file=PROTO_FILEPATH

from models.json

model-color-format=MODEL_COLOR_FORMAT

from models.json

Related topics