How to resolve 'System Throttled Due to Overcurrent' Issue and Increase Concurrent Streams on Custom YOLOv8 Model on Jetson AGX Orin?

How to resolve ‘System Throttled Due to Overcurrent’ Issue and Increase Concurrent Streams on Custom YOLOv8 Model on Jetson AGX Orin?

• Hardware Platform (Jetson / GPU) - Jetson Orin AGX 64 GB Developer Kit
• DeepStream Version - Docker Container - deepstream:7.0-triton-multiarch
• JetPack Version (valid for Jetson only) - 6.0
• TensorRT Version - 8.6.2.3
• NVIDIA GPU Driver Version (valid for GPU only) -
• Issue Type( questions, new requirements, bugs) - Question
• How to reproduce the issue ?

Ran the docker container using:

sudo docker run --runtime=nvidia -it --rm --net=host --privileged -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-7.0 nvcr.io/nvidia/deepstream:7.0-triton-multiarch http://nvcr.io/nvidia/deepstream:7.0-samples-multiarch

Cloned the deepstream_python_apps repo

cd /opt/nvidia/deepstream/deepstream-7.0/sources
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps

Built the python bindings using the instructions shared in the bindings’s README file

cd /opt/nvidia/deepstream/deepstream-7.0/sources/deepstream_python_apps/bindings

For our current application, the sample python app deepstream-imagedata-multistream would suit it the most. And I wanted the application to detect humans and other different classes of animals, so trained a custom YoloV8 Model on our data, named wfpv1.pt

cd sources/deepstream_python_apps/apps/deepstream-imagedata-multistream

Referred to the below docs to convert the .pt model to an .onnx model so as to be able to build a model engine file using TensorRT.

python3 export_yoloV8.py -w wfpv1.pt --dynamic

cd /usr/src/tensorrt/bin

trtexec --onnx=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx \
--saveEngine=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b011616.engine \
--minShapes="input":1x3x640x640 --optShapes="input":16x3x640x640 --maxShapes="input":16x3x640x640&

Here’s my dstest_imagedata_config.txt file:

################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2019-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8)
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#   custom-lib-path,
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
offsets=0.0;0.0;0.0
onnx-file=/opt/nvidia/deepstream/deepstream-6.4/samples/models/tao_pretrained_models/smart_warehouse/wfpv2.onnx
model-engine-file=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b011616.engine
labelfile-path=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/labels.txt
force-implicit-batch-dim=1
batch-size=16
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
# network-mode=0
num-detected-classes=6
maintain-aspect-ratio=0
# maintain-aspect-ratio=1
interval=0
gie-unique-id=1
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/opt/nvidia/deepstream/deepstream-7.0/samples/models/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
# uff-input-order=0
# uff-input-blob-name=images
# output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
#scaling-filter=0
#scaling-compute-hw=0
cluster-mode=2
output-tensor-meta=0
infer-dims=3;640;640

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=1

I modified the deepstream-imagedata-multistream.py python file to contain 6 labels as used by our model.

Now, whenever I run the application, the maximum number of streams I was able to run with a decent fps was about 13, any more then that it won’t run properly with my display showing just static images most of the time with fps=0. And every time I try to run our YoloV8 engine with multiple streams, I get the following system warning:

‘System throttled due to overcurrent’

Here’s what the application logs on to the terminal:

Frames will be saved in  frames
Creating Pipeline 
 
Creating streamux 
 
Creating source_bin  0  
 
Creating source bin
source-bin-00
Creating source_bin  1  
 
Creating source bin
source-bin-01
Creating source_bin  2  
 
Creating source bin
source-bin-02
Creating source_bin  3  
 
Creating source bin
source-bin-03
Creating source_bin  4  
 
Creating source bin
source-bin-04
Creating source_bin  5  
 
Creating source bin
source-bin-05
Creating source_bin  6  
 
Creating source bin
source-bin-06
Creating source_bin  7  
 
Creating source bin
source-bin-07
Creating source_bin  8  
 
Creating source bin
source-bin-08
Creating source_bin  9  
 
Creating source bin
source-bin-09
Creating source_bin  10  
 
Creating source bin
source-bin-10
Creating source_bin  11  
 
Creating source bin
source-bin-11
Creating source_bin  12  
 
Creating source bin
source-bin-12
Creating source_bin  13  
 
Creating source bin
source-bin-13
Creating source_bin  14  
 
Creating source bin
source-bin-14
Creating source_bin  15  
 
Creating source bin
source-bin-15
Creating Pgie 
 
Creating nvvidconv1 
 
Creating filter1 
 
Creating tiler 
 
Creating nvvidconv 
 
Creating nvosd 
 
Is it Integrated GPU? : 1
Creating nv3dsink 

Atleast one of the sources is live
Adding elements to Pipeline 

Linking elements in the Pipeline 

Now playing...
1 :  rtsp link
2 :  rtsp link
3 :  rtsp link
4 :  rtsp link
5 :  rtsp link
6 :  rtsp link
7 :  rtsp link
8 :  rtsp link
9 :  rtsp link
10 :  rtsp link
11 :  rtsp link
12 :  rtsp link
13 :  rtsp link
14 :  rtsp link
15 :  rtsp link
16 :  rtsp link
Starting pipeline 

Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:04.744760030   238 0xaaab52e3bdf0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b011616.engine
INFO: [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       min: 1x3x640x640     opt: 16x3x640x640    Max: 16x3x640x640    
1   OUTPUT kFLOAT boxes           8400x4          min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT scores          8400x1          min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT classes         8400x1          min: 0               opt: 0               Max: 0               

0:00:05.018930770   238 0xaaab52e3bdf0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b011616.engine
0:00:05.032278644   238 0xaaab52e3bdf0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:wfpv1_dstest_imagedata_config.txt sucessfully
Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 

Decodebin child added: source 


**PERF:  {'stream0': 0.0, 'stream1': 0.0, 'stream2': 0.0, 'stream3': 0.0, 'stream4': 0.0, 'stream5': 0.0, 'stream6': 0.0, 'stream7': 0.0, 'stream8': 0.0, 'stream9': 0.0, 'stream10': 0.0, 'stream11': 0.0, 'stream12': 0.0, 'stream13': 0.0, 'stream14': 0.0, 'stream15': 0.0} 


**PERF:  {'stream0': 0.0, 'stream1': 0.0, 'stream2': 0.0, 'stream3': 0.0, 'stream4': 0.0, 'stream5': 0.0, 'stream6': 0.0, 'stream7': 0.0, 'stream8': 0.0, 'stream9': 0.0, 'stream10': 0.0, 'stream11': 0.0, 'stream12': 0.0, 'stream13': 0.0, 'stream14': 0.0, 'stream15': 0.0} 

Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-13/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-10/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-12/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-14/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-09/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-11/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-07/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-04/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-15/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-08/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-01/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-00/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-06/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-05/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-02/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Warning: gst-resource-error-quark: Could not read from resource. (9): ../gst/rtsp/gstrtspsrc.c(5964): gst_rtspsrc_reconnect (): /GstPipeline:pipeline0/GstBin:source-bin-03/GstURIDecodeBin:uri-decode-bin/GstRTSPSrc:source:
Could not receive any UDP packets for 5.0000 seconds, maybe your firewall is blocking it. Retrying using a tcp connection.
Decodebin child added: decodebin0 

Decodebin child added: rtph264depay0 

Decodebin child added: h264parse0 

Decodebin child added: capsfilter0 

/bin/bash: line 1: lsmod: command not found
Decodebin child added: decodebin1 

Decodebin child added: rtph264depay1 

Decodebin child added: h264parse1 

Decodebin child added: capsfilter1 

/bin/bash: line 1: modprobe: command not found
Decodebin child added: nvv4l2decoder0 

Decodebin child added: nvv4l2decoder1 

Opening in BLOCKING MODE 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Decodebin child added: decodebin2 

Decodebin child added: rtph264depay2 

Decodebin child added: h264parse2 

Decodebin child added: capsfilter2 

Decodebin child added: nvv4l2decoder2 

In cb_newpad

In cb_newpad
Opening in BLOCKING MODE 

NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

Decodebin child added: decodebin3 

Decodebin child added: rtph264depay3 

Decodebin child added: h264parse3 

Decodebin child added: capsfilter3 

Decodebin child added: nvv4l2decoder3 

Decodebin child added: decodebin4 

Decodebin child added: rtppcmadepay0 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Decodebin child added: alawdec0 

In cb_newpad

Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame

In cb_newpad

Decodebin child added: decodebin5 

Decodebin child added: rtph264depay4 

Decodebin child added: h264parse4 

Decodebin child added: capsfilter4 

Decodebin child added: nvv4l2decoder4 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

Decodebin child added: decodebin6 

Decodebin child added: rtph265depay0 

Decodebin child added: h265parse0 

Decodebin child added: capsfilter5 

Decodebin child added: nvv4l2decoder5 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 279 
NvMMLiteBlockCreate : Block : BlockType = 279 
Decodebin child added: decodebin7 

Decodebin child added: decodebin8 

Decodebin child added: rtppcmadepay1 

Decodebin child added: rtph264depay5 

Decodebin child added: alawdec1 

Decodebin child added: h264parse5 

Decodebin child added: capsfilter6 

In cb_newpad

Decodebin child added: nvv4l2decoder6 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

Decodebin child added: decodebin9 

Decodebin child added: rtph264depay6 

Decodebin child added: h264parse6 

Decodebin child added: capsfilter7 

Decodebin child added: nvv4l2decoder7 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

In cb_newpad

Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Decodebin child added: decodebin10 

Decodebin child added: rtph264depay7 

Decodebin child added: h264parse7 

Decodebin child added: capsfilter8 

Decodebin child added: nvv4l2decoder8 

Frame Number= 1 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 1 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Decodebin child added: decodebin11 

Decodebin child added: decodebin12 

Decodebin child added: rtppcmadepay2 

Decodebin child added: rtph264depay8 

Decodebin child added: alawdec2 

In cb_newpad

Decodebin child added: h264parse8 

Decodebin child added: capsfilter9 

Decodebin child added: nvv4l2decoder9 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

Frame Number= 2 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 2 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Decodebin child added: decodebin13 

Decodebin child added: rtph264depay9 

Decodebin child added: h264parse9 

Decodebin child added: capsfilter10 

Decodebin child added: nvv4l2decoder10 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

In cb_newpad

Frame Number= 3 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 3 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 1 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 1 Person_count= 0 Rat_count= 0
Decodebin child added: decodebin14 

Decodebin child added: decodebin15 

Decodebin child added: rtph264depay10 

Decodebin child added: rtph264depay11 

Decodebin child added: h264parse10 

Decodebin child added: h264parse11 

Decodebin child added: capsfilter11 

Decodebin child added: nvv4l2decoder11 

Decodebin child added: capsfilter12 

Decodebin child added: nvv4l2decoder12 
Opening in BLOCKING MODE 

NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

In cb_newpad


**PERF:  {'stream0': 0.0, 'stream1': 0.0, 'stream2': 0.0, 'stream3': 0.0, 'stream4': 5.3, 'stream5': 0.0, 'stream6': 0.0, 'stream7': 0.0, 'stream8': 0.0, 'stream9': 0.0, 'stream10': 0.0, 'stream11': 0.0, 'stream12': 0.0, 'stream13': 0.0, 'stream14': 0.0, 'stream15': 5.31} 

Decodebin child added: decodebin16 

Decodebin child added: decodebin17 

Decodebin child added: rtph265depay1 

Decodebin child added: rtppcmadepay3 

Decodebin child added: h265parse1 

Decodebin child added: alawdec3 

Decodebin child added: capsfilter13 

In cb_newpad

Decodebin child added: nvv4l2decoder13 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 279 
NvMMLiteBlockCreate : Block : BlockType = 279 
Decodebin child added: decodebin18 

Decodebin child added: rtph264depay12 

Decodebin child added: h264parse12 

Decodebin child added: capsfilter14 

Decodebin child added: nvv4l2decoder14 

In cb_newpad

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame

Decodebin child added: decodebin19 

Decodebin child added: rtph264depay13 

Decodebin child added: h264parse13 

Decodebin child added: capsfilter15 

Decodebin child added: nvv4l2decoder15 

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
Stream format not found, dropping the frame
In cb_newpad

In cb_newpad

Frame Number= 4 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 4 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 1 Number of Objects= 1 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 1 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 5 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 5 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 2 Number of Objects= 1 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 1 Person_count= 0 Rat_count= 0
Frame Number= 1 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0

**PERF:  {'stream0': 0.0, 'stream1': 0.0, 'stream2': 0.0, 'stream3': 0.0, 'stream4': 0.4, 'stream5': 0.0, 'stream6': 0.4, 'stream7': 0.0, 'stream8': 0.0, 'stream9': 0.0, 'stream10': 1.66, 'stream11': 0.0, 'stream12': 0.0, 'stream13': 0.0, 'stream14': 0.0, 'stream15': 0.4} 

Frame Number= 6 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 0 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0
Frame Number= 6 Number of Objects= 0 Bird_count= 0 Cat_count= 0 Dog_count= 0 Mongoose/Squirrel_count= 0 Person_count= 0 Rat_count= 0

**PERF:  {'stream0': 1.31, 'stream1': 1.6, 'stream2': 1.37, 'stream3': 1.37, 'stream4': 1.6, 'stream5': 1.44, 'stream6': 1.4, 'stream7': 1.37, 'stream8': 1.31, 'stream9': 1.44, 'stream10': 1.6, 'stream11': 1.6, 'stream12': 1.44, 'stream13': 1.44, 'stream14': 1.44, 'stream15': 1.6} 

How can I rectify this issue and run more streams simultaenously with better performance as online benchmarks of Jetson AGX Orin do show that the device is more capable than this. Also, I had tested Peoplenet model with the same setup earlier, and it worked pretty well with us being able to run inference on about 30+ streams at a time.

Can you help check if your nvpmode is MAXN (nvpmodel -q)? If not, can you try with “sudo nvpmodel -m 0”?

Which online benchmarks are you referring to?

Can you share the complete log with your “trtexec” command?

Have you measured the hardware loading by “tegrastats” when running with 13 streams?

Have you seen the "OC ALARM " log when got ‘System throttled due to overcurrent’? What is the value?

can you please refer to the following link:
System throttled due to Over-current on Orin NX

Yes, my nvpmode is MAXN.

Here’s the complete log with my trtexec command.

trtexec --onnx=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx --saveEngine=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b010303.engine --minShapes="input":1x3x640x640 --optShapes="input":3x3x640x640 --maxShapes="input":3x3x640x640&
[1] 5745
root@ubuntu:/usr/src/tensorrt/bin# &&&& RUNNING TensorRT.trtexec [TensorRT v8602] # trtexec --onnx=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx --saveEngine=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b010303.engine --minShapes=input:1x3x640x640 --optShapes=input:3x3x640x640 --maxShapes=input:3x3x640x640
[08/09/2024-10:23:01] [I] === Model Options ===
[08/09/2024-10:23:01] [I] Format: ONNX
[08/09/2024-10:23:01] [I] Model: /opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx
[08/09/2024-10:23:01] [I] Output:
[08/09/2024-10:23:01] [I] === Build Options ===
[08/09/2024-10:23:01] [I] Max batch: explicit batch
[08/09/2024-10:23:01] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/09/2024-10:23:01] [I] minTiming: 1
[08/09/2024-10:23:01] [I] avgTiming: 8
[08/09/2024-10:23:01] [I] Precision: FP32
[08/09/2024-10:23:01] [I] LayerPrecisions: 
[08/09/2024-10:23:01] [I] Layer Device Types: 
[08/09/2024-10:23:01] [I] Calibration: 
[08/09/2024-10:23:01] [I] Refit: Disabled
[08/09/2024-10:23:01] [I] Version Compatible: Disabled
[08/09/2024-10:23:01] [I] ONNX Native InstanceNorm: Disabled
[08/09/2024-10:23:01] [I] TensorRT runtime: full
[08/09/2024-10:23:01] [I] Lean DLL Path: 
[08/09/2024-10:23:01] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/09/2024-10:23:01] [I] Exclude Lean Runtime: Disabled
[08/09/2024-10:23:01] [I] Sparsity: Disabled
[08/09/2024-10:23:01] [I] Safe mode: Disabled
[08/09/2024-10:23:01] [I] Build DLA standalone loadable: Disabled
[08/09/2024-10:23:01] [I] Allow GPU fallback for DLA: Disabled
[08/09/2024-10:23:01] [I] DirectIO mode: Disabled
[08/09/2024-10:23:01] [I] Restricted mode: Disabled
[08/09/2024-10:23:01] [I] Skip inference: Disabled
[08/09/2024-10:23:01] [I] Save engine: /opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b010303.engine
[08/09/2024-10:23:01] [I] Load engine: 
[08/09/2024-10:23:01] [I] Profiling verbosity: 0
[08/09/2024-10:23:01] [I] Tactic sources: Using default tactic sources
[08/09/2024-10:23:01] [I] timingCacheMode: local
[08/09/2024-10:23:01] [I] timingCacheFile: 
[08/09/2024-10:23:01] [I] Heuristic: Disabled
[08/09/2024-10:23:01] [I] Preview Features: Use default preview flags.
[08/09/2024-10:23:01] [I] MaxAuxStreams: -1
[08/09/2024-10:23:01] [I] BuilderOptimizationLevel: -1
[08/09/2024-10:23:01] [I] Input(s)s format: fp32:CHW
[08/09/2024-10:23:01] [I] Output(s)s format: fp32:CHW
[08/09/2024-10:23:01] [I] Input build shape: input=1x3x640x640+3x3x640x640+3x3x640x640
[08/09/2024-10:23:01] [I] Input calibration shapes: model
[08/09/2024-10:23:01] [I] === System Options ===
[08/09/2024-10:23:01] [I] Device: 0
[08/09/2024-10:23:01] [I] DLACore: 
[08/09/2024-10:23:01] [I] Plugins:
[08/09/2024-10:23:01] [I] setPluginsToSerialize:
[08/09/2024-10:23:01] [I] dynamicPlugins:
[08/09/2024-10:23:01] [I] ignoreParsedPluginLibs: 0
[08/09/2024-10:23:01] [I] 
[08/09/2024-10:23:01] [I] === Inference Options ===
[08/09/2024-10:23:01] [I] Batch: Explicit
[08/09/2024-10:23:01] [I] Input inference shape: input=3x3x640x640
[08/09/2024-10:23:01] [I] Iterations: 10
[08/09/2024-10:23:01] [I] Duration: 3s (+ 200ms warm up)
[08/09/2024-10:23:01] [I] Sleep time: 0ms
[08/09/2024-10:23:01] [I] Idle time: 0ms
[08/09/2024-10:23:01] [I] Inference Streams: 1
[08/09/2024-10:23:01] [I] ExposeDMA: Disabled
[08/09/2024-10:23:01] [I] Data transfers: Enabled
[08/09/2024-10:23:01] [I] Spin-wait: Disabled
[08/09/2024-10:23:01] [I] Multithreading: Disabled
[08/09/2024-10:23:01] [I] CUDA Graph: Disabled
[08/09/2024-10:23:01] [I] Separate profiling: Disabled
[08/09/2024-10:23:01] [I] Time Deserialize: Disabled
[08/09/2024-10:23:01] [I] Time Refit: Disabled
[08/09/2024-10:23:01] [I] NVTX verbosity: 0
[08/09/2024-10:23:01] [I] Persistent Cache Ratio: 0
[08/09/2024-10:23:01] [I] Inputs:
[08/09/2024-10:23:01] [I] === Reporting Options ===
[08/09/2024-10:23:01] [I] Verbose: Disabled
[08/09/2024-10:23:01] [I] Averages: 10 inferences
[08/09/2024-10:23:01] [I] Percentiles: 90,95,99
[08/09/2024-10:23:01] [I] Dump refittable layers:Disabled
[08/09/2024-10:23:01] [I] Dump output: Disabled
[08/09/2024-10:23:01] [I] Profile: Disabled
[08/09/2024-10:23:01] [I] Export timing to JSON file: 
[08/09/2024-10:23:01] [I] Export output to JSON file: 
[08/09/2024-10:23:01] [I] Export profile to JSON file: 
[08/09/2024-10:23:01] [I] 
[08/09/2024-10:23:02] [I] === Device Information ===
[08/09/2024-10:23:02] [I] Selected Device: Orin
[08/09/2024-10:23:02] [I] Compute Capability: 8.7
[08/09/2024-10:23:02] [I] SMs: 16
[08/09/2024-10:23:02] [I] Device Global Memory: 62841 MiB
[08/09/2024-10:23:02] [I] Shared Memory per SM: 164 KiB
[08/09/2024-10:23:02] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/09/2024-10:23:02] [I] Application Compute Clock Rate: 1.3 GHz
[08/09/2024-10:23:02] [I] Application Memory Clock Rate: 1.3 GHz
[08/09/2024-10:23:02] [I] 
[08/09/2024-10:23:02] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/09/2024-10:23:02] [I] 
[08/09/2024-10:23:02] [I] TensorRT version: 8.6.2
[08/09/2024-10:23:02] [I] Loading standard plugins
[08/09/2024-10:23:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 18736 (MiB)
[08/09/2024-10:23:06] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1424, now: CPU 1223, GPU 20196 (MiB)
[08/09/2024-10:23:06] [I] Start parsing network model.
[08/09/2024-10:23:06] [I] [TRT] ----------------------------------------------------------------
[08/09/2024-10:23:06] [I] [TRT] Input filename:   /opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx
[08/09/2024-10:23:06] [I] [TRT] ONNX IR version:  0.0.8
[08/09/2024-10:23:06] [I] [TRT] Opset version:    16
[08/09/2024-10:23:06] [I] [TRT] Producer name:    pytorch
[08/09/2024-10:23:06] [I] [TRT] Producer version: 2.3.1
[08/09/2024-10:23:06] [I] [TRT] Domain:           
[08/09/2024-10:23:06] [I] [TRT] Model version:    0
[08/09/2024-10:23:06] [I] [TRT] Doc string:       
[08/09/2024-10:23:06] [I] [TRT] ----------------------------------------------------------------
[08/09/2024-10:23:06] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/09/2024-10:23:06] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[08/09/2024-10:23:06] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[08/09/2024-10:23:06] [I] Finished parsing network model. Parse time: 0.323625
[08/09/2024-10:23:06] [W] [TRT] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[08/09/2024-10:23:06] [I] [TRT] Graph optimization time: 0.0629787 seconds.
[08/09/2024-10:23:06] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/09/2024-10:34:51] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/09/2024-10:34:51] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[08/09/2024-10:34:52] [I] [TRT] Total Host Persistent Memory: 607600
[08/09/2024-10:34:52] [I] [TRT] Total Device Persistent Memory: 0
[08/09/2024-10:34:52] [I] [TRT] Total Scratch Memory: 806400
[08/09/2024-10:34:52] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 36 MiB, GPU 768 MiB
[08/09/2024-10:34:52] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 333 steps to complete.
[08/09/2024-10:34:52] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 76.788ms to assign 19 blocks to 333 nodes requiring 362500608 bytes.
[08/09/2024-10:34:52] [I] [TRT] Total Activation Memory: 362497536
[08/09/2024-10:34:52] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +22, GPU +512, now: CPU 22, GPU 512 (MiB)
[08/09/2024-10:34:53] [I] Engine built in 711.369 sec.
[08/09/2024-10:34:53] [I] [TRT] Loaded engine size: 263 MiB
[08/09/2024-10:34:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +259, now: CPU 0, GPU 259 (MiB)
[08/09/2024-10:34:53] [I] Engine deserialized in 0.154215 sec.
[08/09/2024-10:34:54] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +346, now: CPU 0, GPU 605 (MiB)
[08/09/2024-10:34:54] [I] Setting persistentCacheLimit to 0 bytes.
[08/09/2024-10:34:54] [I] Using random values for input input
[08/09/2024-10:34:54] [I] Input binding for input with dimensions 3x3x640x640 is created.
[08/09/2024-10:34:54] [I] Output binding for boxes with dimensions 3x8400x4 is created.
[08/09/2024-10:34:54] [I] Output binding for scores with dimensions 3x8400x1 is created.
[08/09/2024-10:34:54] [I] Output binding for classes with dimensions 3x8400x1 is created.
[08/09/2024-10:34:54] [I] Starting inference
[08/09/2024-10:34:57] [I] Warmup completed 3 queries over 200 ms
[08/09/2024-10:34:57] [I] Timing trace has 39 queries over 3.2459 s
[08/09/2024-10:34:57] [I] 
[08/09/2024-10:34:57] [I] === Trace details ===
[08/09/2024-10:34:57] [I] Trace averages of 10 runs:
[08/09/2024-10:34:57] [I] Average on 10 runs - GPU latency: 81.1062 ms - Host latency: 81.9306 ms (enqueue 2.32066 ms)
[08/09/2024-10:34:57] [I] Average on 10 runs - GPU latency: 81.0964 ms - Host latency: 81.9153 ms (enqueue 2.20279 ms)
[08/09/2024-10:34:57] [I] Average on 10 runs - GPU latency: 81.1296 ms - Host latency: 81.9545 ms (enqueue 2.30354 ms)
[08/09/2024-10:34:57] [I] 
[08/09/2024-10:34:57] [I] === Performance summary ===
[08/09/2024-10:34:57] [I] Throughput: 12.0152 qps
[08/09/2024-10:34:57] [I] Latency: min = 81.7834 ms, max = 82.0637 ms, mean = 81.9264 ms, median = 81.9178 ms, percentile(90%) = 82.0334 ms, percentile(95%) = 82.045 ms, percentile(99%) = 82.0637 ms
[08/09/2024-10:34:57] [I] Enqueue Time: min = 1.87122 ms, max = 2.79944 ms, mean = 2.2895 ms, median = 2.23511 ms, percentile(90%) = 2.76587 ms, percentile(95%) = 2.79175 ms, percentile(99%) = 2.79944 ms
[08/09/2024-10:34:57] [I] H2D Latency: min = 0.748779 ms, max = 0.87616 ms, mean = 0.785958 ms, median = 0.775757 ms, percentile(90%) = 0.82666 ms, percentile(95%) = 0.831055 ms, percentile(99%) = 0.87616 ms
[08/09/2024-10:34:57] [I] GPU Compute Time: min = 80.9329 ms, max = 81.2552 ms, mean = 81.1044 ms, median = 81.1044 ms, percentile(90%) = 81.2124 ms, percentile(95%) = 81.2463 ms, percentile(99%) = 81.2552 ms
[08/09/2024-10:34:57] [I] D2H Latency: min = 0.0239258 ms, max = 0.0450439 ms, mean = 0.0360483 ms, median = 0.0366211 ms, percentile(90%) = 0.041626 ms, percentile(95%) = 0.0422363 ms, percentile(99%) = 0.0450439 ms
[08/09/2024-10:34:57] [I] Total Host Walltime: 3.2459 s
[08/09/2024-10:34:57] [I] Total GPU Compute Time: 3.16307 s
[08/09/2024-10:34:57] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/09/2024-10:34:57] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # trtexec --onnx=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx --saveEngine=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b010303.engine --minShapes=input:1x3x640x640 --optShapes=input:3x3x640x640 --maxShapes=input:3x3x640x640
^C
[1]+  Done                    trtexec --onnx=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2.onnx --saveEngine=/opt/nvidia/deepstream/deepstream-7.0/samples/models/smart_warehouse/wfpv2_b010303.engine --minShapes="input":1x3x640x640 --optShapes="input":3x3x640x640 --maxShapes="input":3x3x640x640

Thank you for this insight, I’ve been checking and it is due to high instant current when initiating inference. I think its likely due to the model size, we are using yolov8x (~68M) right now, and we are now training a yolov8m which has a lot less parameters (~25M). That should most likely improve the current situation, right?

The model is quite overloaded for your device if you try to run the case of deepstream-imagedata-multistream sample.

We are not sure whether this is the root cause of ‘System Throttled Due to Overcurrent’, so more information is needed for identifying the root cause.

  1. Have you seen the "OC ALARM " log when got ‘System throttled due to overcurrent’? What is the value?
  2. This may help you to lock the clock of the Jetson devices. Please apply this script when you run your case.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.