Trafficcamnet is too lagging in deepstream-nvdsanalytics-test / Jetson Nano

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only) 4.6
• TensorRT Version 8.0.1
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello, I am studying Deepstream SDK using Jetson Nano.
I explain my environment.

I used the deepstream-nvdsanalytics-test provided in deepstream-6.0.
I downloaded trafficcamnet and converted it to an engine file in fp16 mode using tao-converter.
I converted to the command below.
(./tao-converter -k tlt_encode -d 3,544,960 /opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt -t fp16 -c /optnvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/trafficcamnet_int8.txt -b 4 -m 8)

And I modified config_nvdsanalytics.txt and nvdsanalytic_pgie_config.
It is as below.
################################################################################

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a

copy of this software and associated documentation files (the “Software”),

to deal in the Software without restriction, including without limitation

the rights to use, copy, modify, merge, publish, distribute, sublicense,

and/or sell copies of the Software, and to permit persons to whom the

Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in

all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL

THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING

FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER

DEALINGS IN THE SOFTWARE.

################################################################################

The values in the config file are overridden by values set through GObject

properties.

[property]
enable=1
#Width height used for configuration to which below configs are configured
config-width=1920
config-height=1080
#osd-mode 0: Dont display any lines, rois and text

1: Display only lines, rois and static text i.e. labels

2: Display all info from 1 plus information about counts

osd-mode=2
#Set OSD font size that has to be displayed
display-font-size=12

Per stream configuration

[roi-filtering-stream-0]
#enable or disable following feature
enable=0
#ROI to filter select objects, and remove from meta data
roi-RF=295;643;579;634;642;913;56;828
#remove objects in the ROI
inverse-roi=0
class-id=-1

Per stream configuration

[roi-filtering-stream-2]
#enable or disable following feature
enable=0
#ROI to filter select objects, and remove from meta data
roi-RF=295;643;579;634;642;913;56;828
#remove objects in the ROI
inverse-roi=1
class-id=0

[overcrowding-stream-1]
enable=0
roi-OC=295;643;579;634;642;913;56;828
#no of objects that will trigger OC
object-threshold=3
class-id=-1

[line-crossing-stream-0]
enable=1
#Label;direction;lc
#line-crossing-Entry=1072;911;1143;1058;944;1020;1297;1020;
#line-crossing-Exit=789;672;1084;900;851;773;1203;732
line-crossing-Exit=870;462;1048;929;637;577;1172;504
class-id=0
#extended when 0- only counts crossing on the configured Line

1- assumes extended Line crossing counts all the crossing

extended=0
#LC modes supported:
#loose : counts all crossing without strong adherence to direction
#balanced: Strict direction adherence expected compared to mode=loose
#strict : Strict direction adherence expected compared to mode=balanced
mode=loose

[direction-detection-stream-0]
enable=0
#Label;direction;
direction-South=284;840;360;662;
direction-North=1106;622;1312;701;
class-id=0

################################################################################

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a

copy of this software and associated documentation files (the “Software”),

to deal in the Software without restriction, including without limitation

the rights to use, copy, modify, merge, publish, distribute, sublicense,

and/or sell copies of the Software, and to permit persons to whom the

Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in

all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL

THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING

FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER

DEALINGS IN THE SOFTWARE.

################################################################################

Following properties are mandatory when engine files are not specified:

int8-calib-file(Only in INT8)

Caffemodel mandatory properties: model-file, proto-file, output-blob-names

UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names

ONNX: onnx-file

Mandatory properties for detectors:

num-detected-classes

Optional properties for detectors:

enable-dbscan(Default=false), interval(Primary mode only, Default=0)

custom-lib-path

parse-bbox-func-name

Mandatory properties for classifiers:

classifier-threshold, is-classifier

Optional properties for classifiers:

classifier-async-mode(Secondary mode only, Default=false)

Optional properties in secondary mode:

operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),

input-object-min-width, input-object-min-height, input-object-max-width,

input-object-max-height

Following properties are always recommended:

batch-size(Default=1)

Other optional properties:

net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),

model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,

mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),

custom-lib-path, network-mode(Default=0 i.e FP32)

The values in the config file are overridden by values set through GObject

properties.

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
#tlt-encoded-model=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt
labelfile-path=labels_trafficnet.txt
int8-calib-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/trafficcamnet_int8.txt
model-engine-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine
input-dims=3;544;960;0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0

0=FP32, 1=INT8, 2=FP16 mode

network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
cluster-mode=2

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=1

And I modified the probe of the source and made it print out the FPS.
(It was the same symptom before modifying probe.)
And
./deepstream-nvdsanalytics-test file://opt/nvidia/deepstream/deepstream-6.0/samples/streams/samples_720p.h264, the terminal output shows about 16 FPS.
(I saw in the document provided by Nvidia that Nano’s trafficcamnet fps is 17. Is this difference normal?)
However, the image output on the monitor looks like the FPS is 1.
It’s the same even if I change the video to something else. (e.g. mkv or rtsp…)
How can I watch this video smoothly?

This is a video that shows the symptoms of the stomach that I took.

Can you use “trtexec” with the TrafficCamNet model to check the performance of Nano?

nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./trtexec --loadEngine==/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine --fp16

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # ./trtexec --loadEngine==/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine --fp16

[01/30/2023-15:13:17] [I] === Model Options ===

[01/30/2023-15:13:17] [I] Format: *

[01/30/2023-15:13:17] [I] Model:

[01/30/2023-15:13:17] [I] Output:

[01/30/2023-15:13:17] [I] === Build Options ===

[01/30/2023-15:13:17] [I] Max batch: 1

[01/30/2023-15:13:17] [I] Workspace: 16 MiB

[01/30/2023-15:13:17] [I] minTiming: 1

[01/30/2023-15:13:17] [I] avgTiming: 8

[01/30/2023-15:13:17] [I] Precision: FP32+FP16

[01/30/2023-15:13:17] [I] Calibration:

[01/30/2023-15:13:17] [I] Refit: Disabled

[01/30/2023-15:13:17] [I] Sparsity: Disabled

[01/30/2023-15:13:17] [I] Safe mode: Disabled

[01/30/2023-15:13:17] [I] Restricted mode: Disabled

[01/30/2023-15:13:17] [I] Save engine:

[01/30/2023-15:13:17] [I] Load engine: =/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine

[01/30/2023-15:13:17] [I] NVTX verbosity: 0

[01/30/2023-15:13:17] [I] Tactic sources: Using default tactic sources

[01/30/2023-15:13:17] [I] timingCacheMode: local

[01/30/2023-15:13:17] [I] timingCacheFile:

[01/30/2023-15:13:17] [I] Input(s)s format: fp32:CHW

[01/30/2023-15:13:17] [I] Output(s)s format: fp32:CHW

[01/30/2023-15:13:17] [I] Input build shapes: model

[01/30/2023-15:13:17] [I] Input calibration shapes: model

[01/30/2023-15:13:17] [I] === System Options ===

[01/30/2023-15:13:17] [I] Device: 0

[01/30/2023-15:13:17] [I] DLACore:

[01/30/2023-15:13:17] [I] Plugins:

[01/30/2023-15:13:17] [I] === Inference Options ===

[01/30/2023-15:13:17] [I] Batch: 1

[01/30/2023-15:13:17] [I] Input inference shapes: model

[01/30/2023-15:13:17] [I] Iterations: 10

[01/30/2023-15:13:17] [I] Duration: 3s (+ 200ms warm up)

[01/30/2023-15:13:17] [I] Sleep time: 0ms

[01/30/2023-15:13:17] [I] Streams: 1

[01/30/2023-15:13:17] [I] ExposeDMA: Disabled

[01/30/2023-15:13:17] [I] Data transfers: Enabled

[01/30/2023-15:13:17] [I] Spin-wait: Disabled

[01/30/2023-15:13:17] [I] Multithreading: Disabled

[01/30/2023-15:13:17] [I] CUDA Graph: Disabled

[01/30/2023-15:13:17] [I] Separate profiling: Disabled

[01/30/2023-15:13:17] [I] Time Deserialize: Disabled

[01/30/2023-15:13:17] [I] Time Refit: Disabled

[01/30/2023-15:13:17] [I] Skip inference: Disabled

[01/30/2023-15:13:17] [I] Inputs:

[01/30/2023-15:13:17] [I] === Reporting Options ===

[01/30/2023-15:13:17] [I] Verbose: Disabled

[01/30/2023-15:13:17] [I] Averages: 10 inferences

[01/30/2023-15:13:17] [I] Percentile: 99

[01/30/2023-15:13:17] [I] Dump refittable layers:Disabled

[01/30/2023-15:13:17] [I] Dump output: Disabled

[01/30/2023-15:13:17] [I] Profile: Disabled

[01/30/2023-15:13:17] [I] Export timing to JSON file:

[01/30/2023-15:13:17] [I] Export output to JSON file:

[01/30/2023-15:13:17] [I] Export profile to JSON file:

[01/30/2023-15:13:17] [I]

[01/30/2023-15:13:17] [I] === Device Information ===

[01/30/2023-15:13:17] [I] Selected Device: NVIDIA Tegra X1

[01/30/2023-15:13:17] [I] Compute Capability: 5.3

[01/30/2023-15:13:17] [I] SMs: 1

[01/30/2023-15:13:17] [I] Compute Clock Rate: 0.9216 GHz

[01/30/2023-15:13:17] [I] Device Global Memory: 3964 MiB

[01/30/2023-15:13:17] [I] Shared Memory per SM: 64 KiB

[01/30/2023-15:13:17] [I] Memory Bus Width: 64 bits (ECC disabled)

[01/30/2023-15:13:17] [I] Memory Clock Rate: 0.01275 GHz

[01/30/2023-15:13:17] [I]

[01/30/2023-15:13:17] [I] TensorRT version: 8001

[01/30/2023-15:13:17] [E] Error opening engine file: =/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine

[01/30/2023-15:13:17] [E] Engine creation failed

[01/30/2023-15:13:17] [E] Engine set up failed

&&&& FAILED TensorRT.trtexec [TensorRT v8001] # ./trtexec --loadEngine==/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine --fp16

I think I don’t know how to use trtexec.
Or was there a problem with the process of converting the .etlt file into an .engine file? (I wrote down the method I converted above.)

The command could be trtexec --maxBatch=1 --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine

nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./trtexec --maxBatch=1 --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # ./trtexec --maxBatch=1 --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine
[01/30/2023-22:01:04] [I] === Model Options ===
[01/30/2023-22:01:04] [I] Format: *
[01/30/2023-22:01:04] [I] Model:
[01/30/2023-22:01:04] [I] Output:
[01/30/2023-22:01:04] [I] === Build Options ===
[01/30/2023-22:01:04] [I] Max batch: 1
[01/30/2023-22:01:04] [I] Workspace: 16 MiB
[01/30/2023-22:01:04] [I] minTiming: 1
[01/30/2023-22:01:04] [I] avgTiming: 8
[01/30/2023-22:01:04] [I] Precision: FP32
[01/30/2023-22:01:04] [I] Calibration:
[01/30/2023-22:01:04] [I] Refit: Disabled
[01/30/2023-22:01:04] [I] Sparsity: Disabled
[01/30/2023-22:01:04] [I] Safe mode: Disabled
[01/30/2023-22:01:04] [I] Restricted mode: Disabled
[01/30/2023-22:01:04] [I] Save engine:
[01/30/2023-22:01:04] [I] Load engine: /opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine
[01/30/2023-22:01:04] [I] NVTX verbosity: 0
[01/30/2023-22:01:04] [I] Tactic sources: Using default tactic sources
[01/30/2023-22:01:04] [I] timingCacheMode: local
[01/30/2023-22:01:04] [I] timingCacheFile:
[01/30/2023-22:01:04] [I] Input(s)s format: fp32:CHW
[01/30/2023-22:01:04] [I] Output(s)s format: fp32:CHW
[01/30/2023-22:01:04] [I] Input build shapes: model
[01/30/2023-22:01:04] [I] Input calibration shapes: model
[01/30/2023-22:01:04] [I] === System Options ===
[01/30/2023-22:01:04] [I] Device: 0
[01/30/2023-22:01:04] [I] DLACore:
[01/30/2023-22:01:04] [I] Plugins:
[01/30/2023-22:01:04] [I] === Inference Options ===
[01/30/2023-22:01:04] [I] Batch: 1
[01/30/2023-22:01:04] [I] Input inference shapes: model
[01/30/2023-22:01:04] [I] Iterations: 10
[01/30/2023-22:01:04] [I] Duration: 3s (+ 200ms warm up)
[01/30/2023-22:01:04] [I] Sleep time: 0ms
[01/30/2023-22:01:04] [I] Streams: 1
[01/30/2023-22:01:04] [I] ExposeDMA: Disabled
[01/30/2023-22:01:04] [I] Data transfers: Enabled
[01/30/2023-22:01:04] [I] Spin-wait: Disabled
[01/30/2023-22:01:04] [I] Multithreading: Disabled
[01/30/2023-22:01:04] [I] CUDA Graph: Disabled
[01/30/2023-22:01:04] [I] Separate profiling: Disabled
[01/30/2023-22:01:04] [I] Time Deserialize: Disabled
[01/30/2023-22:01:04] [I] Time Refit: Disabled
[01/30/2023-22:01:04] [I] Skip inference: Disabled
[01/30/2023-22:01:04] [I] Inputs:
[01/30/2023-22:01:04] [I] === Reporting Options ===
[01/30/2023-22:01:04] [I] Verbose: Disabled
[01/30/2023-22:01:04] [I] Averages: 10 inferences
[01/30/2023-22:01:04] [I] Percentile: 99
[01/30/2023-22:01:04] [I] Dump refittable layers:Disabled
[01/30/2023-22:01:04] [I] Dump output: Disabled
[01/30/2023-22:01:04] [I] Profile: Disabled
[01/30/2023-22:01:04] [I] Export timing to JSON file:
[01/30/2023-22:01:04] [I] Export output to JSON file:
[01/30/2023-22:01:04] [I] Export profile to JSON file:
[01/30/2023-22:01:04] [I]
[01/30/2023-22:01:04] [I] === Device Information ===
[01/30/2023-22:01:04] [I] Selected Device: NVIDIA Tegra X1
[01/30/2023-22:01:04] [I] Compute Capability: 5.3
[01/30/2023-22:01:04] [I] SMs: 1
[01/30/2023-22:01:04] [I] Compute Clock Rate: 0.9216 GHz
[01/30/2023-22:01:04] [I] Device Global Memory: 3964 MiB
[01/30/2023-22:01:04] [I] Shared Memory per SM: 64 KiB
[01/30/2023-22:01:04] [I] Memory Bus Width: 64 bits (ECC disabled)
[01/30/2023-22:01:04] [I] Memory Clock Rate: 0.01275 GHz
[01/30/2023-22:01:04] [I]
[01/30/2023-22:01:04] [I] TensorRT version: 8001
[01/30/2023-22:01:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 229, GPU 3359 (MiB)
[01/30/2023-22:01:06] [I] [TRT] Loaded engine size: 8 MB
[01/30/2023-22:01:06] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 229 MiB, GPU 3359 MiB
[01/30/2023-22:01:06] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[01/30/2023-22:01:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +152, now: CPU 387, GPU 3518 (MiB)
[01/30/2023-22:01:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +243, now: CPU 628, GPU 3761 (MiB)
[01/30/2023-22:01:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 628, GPU 3761 (MiB)
[01/30/2023-22:01:09] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 628 MiB, GPU 3761 MiB
[01/30/2023-22:01:09] [I] Engine loaded in 4.28236 sec.
[01/30/2023-22:01:09] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 620 MiB, GPU 3753 MiB
[01/30/2023-22:01:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +1, now: CPU 620, GPU 3754 (MiB)
[01/30/2023-22:01:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 620, GPU 3754 (MiB)
[01/30/2023-22:01:09] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 620 MiB, GPU 3753 MiB
[01/30/2023-22:01:09] [I] Created input binding for input_1 with dimensions 3x544x960
[01/30/2023-22:01:09] [I] Created output binding for output_bbox/BiasAdd with dimensions 16x34x60
[01/30/2023-22:01:09] [I] Created output binding for output_cov/Sigmoid with dimensions 4x34x60
[01/30/2023-22:01:09] [I] Starting inference
[01/30/2023-22:01:12] [I] Warmup completed 2 queries over 200 ms
[01/30/2023-22:01:12] [I] Timing trace has 59 queries over 3.10934 s
[01/30/2023-22:01:12] [I]
[01/30/2023-22:01:12] [I] === Trace details ===
[01/30/2023-22:01:12] [I] Trace averages of 10 runs:
[01/30/2023-22:01:12] [I] Average on 10 runs - GPU latency: 52.3593 ms - Host latency: 53.0201 ms (end to end 53.0421 ms, enqueue 1.17602 ms)
[01/30/2023-22:01:12] [I] Average on 10 runs - GPU latency: 51.9704 ms - Host latency: 52.6315 ms (end to end 52.6562 ms, enqueue 1.29115 ms)
[01/30/2023-22:01:12] [I] Average on 10 runs - GPU latency: 51.8144 ms - Host latency: 52.4726 ms (end to end 52.493 ms, enqueue 1.16016 ms)
[01/30/2023-22:01:12] [I] Average on 10 runs - GPU latency: 51.9756 ms - Host latency: 52.6548 ms (end to end 52.6738 ms, enqueue 1.99845 ms)
[01/30/2023-22:01:12] [I] Average on 10 runs - GPU latency: 52.0052 ms - Host latency: 52.6693 ms (end to end 52.6911 ms, enqueue 1.31099 ms)
[01/30/2023-22:01:12] [I]
[01/30/2023-22:01:12] [I] === Performance summary ===
[01/30/2023-22:01:12] [I] Throughput: 18.9751 qps
[01/30/2023-22:01:12] [I] Latency: min = 52.1882 ms, max = 55.5599 ms, mean = 52.6779 ms, median = 52.4993 ms, percentile(99%) = 55.5599 ms
[01/30/2023-22:01:12] [I] End-to-End Host Latency: min = 52.2009 ms, max = 55.5867 ms, mean = 52.7 ms, median = 52.5259 ms, percentile(99%) = 55.5867 ms
[01/30/2023-22:01:12] [I] Enqueue Time: min = 0.983154 ms, max = 4.7627 ms, mean = 1.37751 ms, median = 1.08105 ms, percentile(99%) = 4.7627 ms
[01/30/2023-22:01:12] [I] H2D Latency: min = 0.622925 ms, max = 0.809204 ms, mean = 0.641511 ms, median = 0.63623 ms, percentile(99%) = 0.809204 ms
[01/30/2023-22:01:12] [I] GPU Compute Time: min = 51.5288 ms, max = 54.8996 ms, mean = 52.0139 ms, median = 51.8355 ms, percentile(99%) = 54.8996 ms
[01/30/2023-22:01:12] [I] D2H Latency: min = 0.020752 ms, max = 0.0268555 ms, mean = 0.022463 ms, median = 0.0224609 ms, percentile(99%) = 0.0268555 ms
[01/30/2023-22:01:12] [I] Total Host Walltime: 3.10934 s
[01/30/2023-22:01:12] [I] Total GPU Compute Time: 3.06882 s
[01/30/2023-22:01:12] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/30/2023-22:01:12] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # ./trtexec --maxBatch=1 --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/samples/models/tao_pretrained_models/trafficcamnet/resnet18_trafficcamnet_pruned.etlt_b1_gpu0_fp16.engine
[01/30/2023-22:01:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 620, GPU 3779 (MiB)
nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$

The problem was that there were two “=s” after loadEngine. I brought the performance test results.

Please set the “sync” property of the nveglglessink to "FALSE“ in the deepstream-nvdsanalytics-test source code.

I can try it tomorrow or Friday because business trip.
And is there a document that explains how the sync property of nveglglesink affects?
I wonder why you thought the “sync” property should be “FALSE”.

Please refer to Troubleshooting — DeepStream 6.1.1 Release documentation

Thank you for your reply.
In fact, the screen output to the monitor has become smoother. However, the image takes about 2 seconds to process the rtsp image coming in at 30 fps per second.

(It appears to be trying to process all frames sequentially.In conclusion, the video looks like a slow motion as if it was 0.5 times faster)

I’d like to discard some frames received frames.
I want to process only the most recently received frames.
(The one-second video makes it look like one-second…)

Are there any properties related to this?
Or should I write a new topic?

Have you done everything mentioned in Troubleshooting — DeepStream 6.2 Release documentation (nvidia.com)?

The actual performance is 17FPS, to drop frames will not improve the performance or the visual smoothness. It is hardware limitation.

Thank you for your kind answer.
No, I haven’t tried everything.

I already know that the limit of hardware performance is 17FPS.

To be exact,
I put a camera on the road.
We plan to detect the speed of passing vehicles.

If the vehicle appears to be passing slowly, it cannot calculate the exact speed.

Assuming I received an A1 frame,
It takes about 58ms (17fps) to process the A1 frame.
At least one or two frames can be received(30fps rtsp) during that 58 ms (I’ll call it A2, A3).)

I want to discard frames received during those 58ms (while processing A1).

And when the processing of the A1 frame is finished, I want to process only the most recent frame. (For example, if A4 has not been received yet, process A3…)

There is no such function in DeepStream. A possible way is to use videorate (gstreamer.freedesktop.org) to adapt the video rate to what you want.

Thank you very much.
All my questions have been answered.
Have a nice day!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.