Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX

• Hardware Platform (Jetson / GPU)
Jetson NX
• DeepStream Version
6.0
• JetPack Version (valid for Jetson only)
4.6.
• TensorRT Version
8.0.1
• Issue Type( questions, new requirements, bugs)
Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Hi, we’re looking to set up a pipeline for model training and deployment from Nvidia TAO to DeepStream using TAO (preferably version 5.0) and DeepStream 6.0 (fixed). I’m having issues getting results in Deepstream at the minute though:

Following the tutorial to integrate Yolov4 from Nvidia TAO’s default collection Deploying to DeepStream for YOLOv4 - NVIDIA Docs
I don’t see any annotation output on the video
I’ve downloaded the tao 3.0 Yolov4 model from TAO Toolkit Integration with DeepStream — DeepStream 6.3 Release documentation
and generated the engine with:
tao-converter_vv3.21.11_trt8.0_aarch64/tao-converter -k nvidia_tlt -t fp16 -p Input,1x3x544x960,4x3x544x960,16x3x544x960 -e /data/models/yolov4_resnet18_default.engine -m 4 …/yolov4_resnet18_395.etlt -d 3,544,960
I’ve used these as the labels https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/nvinfer/yolov4_tao/yolov4_labels.txt

Am i missing anything?

Also in general is there any differences in the process to deploy from TAO v5.0? As we’ll be using that in the future for yolov4 models

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Preferably some tutorial and downloadable yolov4 model

please refer to this TAO yolov4 sample.

It runs. Although when I test the .engine with batch size of >1:
/usr/src/tensorrt/bin/trtexec --loadEngine=/data/models/yolov4_resnet18_default.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph
I get:
Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueue::276, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 4, but engine max batch size was: 1

When running through DeepStream it works but I get the same low FPS result as in Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline

I’ll continue to investigate

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

I’ve trialled the fps improvements:

Using enable-perf-measurement=1
it looks ~14fps (30fps input video)

Using export NVDS_ENABLE_LATENCY_MEASUREMENT=1, we see:
PERF: 13.80 (13.92) 13.80 (13.88) 13.80 (13.92) 13.80 (13.92)
BATCH-NUM = 2

Batch meta not found for buffer 0x7eb40097c0
BATCH-NUM = 3**
Batch meta not found for buffer 0x7ec4048b40

I’m setting batched-push-timeout to 1/max_fps (33333)
Height and width in streammux are set to the input video’s height and width

Looking at jtop, the GPU usage appears to sit at >99% the majority of the time, and drops down once every few seconds

Setting qos=0 in sink0 appears to make no difference

I can now also bounding boxes printed on an unrelated part of the screen - not sure if this is related?

For a bit more info this is my config file:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

output display details

[tiled-display]
enable=1
rows=2
columns=2
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

mp4 video source

[source0]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=0
camera-width=1280
camera-height=720

mp4 video source

[source1]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=1
camera-width=1280
camera-height=720

mp4 video source

[source2]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=2
camera-width=1280
camera-height=720

mp4 video source

[source3]
enable=1
type=2
uri=file:///data/videos-test/RowdenCarpark2.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0
source-id=3
camera-width=1280
camera-height=720

rtsp video out

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=2700000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0

set below properties in case of RTSPStreaming

rtsp-port=8556
udp-port=5400
#source-id=0
qos=0

mp4 out

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
enc-type=0
#1=h264 2=h265 3=mpeg4

only SW mpeg4 is supported right now.

codec=1
sync=1
bitrate=4000000
profile=0
output-file=/data/videos-out/28112023_110616.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvoverlaysink 6=MsgConvBroker
type=6
msg-conv-config=redis_msg_config.txt
#(0): PAYLOAD_DEEPSTREAM - Deepstream schema payload
#(1): PAYLOAD_DEEPSTREAM_MINIMAL - Deepstream schema payload minimal
#(256): PAYLOAD_RESERVED - Reserved type
#(257): PAYLOAD_CUSTOM - Custom schema payload
msg-conv-payload-type=0
msg-conv-msg2p-new-api=1
msg-conv-frame-interval=100
#msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_kafka_proto.so
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_redis_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=localhost;6379
#topic=deepstream_detection_messages
topic=metadata
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/redis_protocol_adaptor/cfg_redis.txt

on screen display

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

stream mux - forms batches of frames from multiple input sources

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=33333

Set muxer output width and height

width=1280
height=720
#enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

primary gpu inference engine (model)

[primary-gie]
enable=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;0;1
nvbuf-memory-type=0
config-file=detector_config.txt

[tracker]
enable=1

For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively

tracker-width=320
tracker-height=256
ll-lib-file=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so

ll-config-file required to set different tracker types

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_IOU.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

ll-config-file=/opt/nvidia/deepstream/deepstream-DEEPSTREAM_VER/samples/configs/deepstream-app/config_tracker_DeepSORT.yml

gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

secondary gpu inference engine (model)

[secondary-gie]
enable=0
gpu-id=0
batch-size=4

0=FP32, 1=INT8, 2=FP16 mode

nvbuf-memory-type=0
config-file=classifier_config.txt
gie-unique-id=2
operate-on-gie-id=1

[tests]
file-loop=0

and this is my detector_config.txt
[property]
gpu-id=0

custom-network-config=/data/models/DARKNET_CFG

model-file=/data/models/yolov4_resnet18_395.etlt

proto-file=PROTO_FILEPATH

onnx-file=/data/models/yolov4_tao_default.onnx
model-engine-file=/data/models/yolov4_resnet18_395.etlt_b4_gpu0_int8.engine
batch-size=4
gie-unique-id=1
maintain-aspect-ratio=0
symmetric-padding=0
network-mode=2
process-mode=1
network-type=0
interval=4
engine-create-func-name=NvDsInferYoloCudaEngineGet
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
int8-calib-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/cal_trt.bin

from models.json

net-scale-factor=1.0
labelfile-path=/data/labels/tao_default.txt
num-detected-classes=4
cluster-mode=3
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_tlt_apps/post_processor/libnvds_infercustomparser_tlt.so

model-color-format=MODEL_COLOR_FORMAT

[class-attrs-all]
topk=20
nms-iou-threshold=0.5
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

from models.json

pre-cluster-threshold=0.25

I’m running in docker using the nvcr.io/nvidia/deepstream-l4t:6.0-samples image on a Jetson NX (rev32.6.1)
and the setup of the deepstream_tlt_apps library in the dockerfile looks like this:

WORKDIR /opt/nvidia/deepstream/deepstream-6.0/sources
RUN git clone -b release/tlt3.0 GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
WORKDIR /opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_tlt_apps
RUN /bin/sh -c “make CUDA_VER=10.2”

please refer to topic for performance improvement.

model accuracy is bad. please check if nvinfer’s configuration is right. please refer to parameters explanation doc.

Even with the changes I’m still having the same performance issues as here Problems with Onnx model zoo -> trtexec -> DeepStream 6.0 pipeline - #11 by sebastianvvj9c

Thanks for the info on the nvinfer configuration - I can sort that

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

No, sorted now thanks. It was a GPU load issue given the model size

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.