Hello,
i have a problem on my Jetson Xavier regarding the performance. The YoloV3 Model haves huge fluctuations regarding the inference time. The times mentioned are in int8 precision.
The best inference time i got is around 33ms per image. However, the time fluctuates a lot. Most of the time the gstreamer pipeline using deepstream takes 150ms per image for inference. Then the inference time jumps down to my 33ms for about 2 seconds and then it goes again up to 150ms. I was thinking the c++ program i use to build my pipeline is not getting all the resources but as far i know, the program gets all resources it needs. I am using qtcreator qt version 4.8.7.
When i was beginning working with the Jetson, i experienced this behavior in the other direction. The inference time was most of the time low and got high only for a few seconds. Now the high inference time is the new standard. I even flashed the newest Jetpack release on the jetson to make sure i did not changed any option i do not know about.
For time measurement, i measure the time the nvinfer plugin gets a buffer to the time the buffer leaves the element. The timestamps are attached to the buffer via metafiles.
To make sure my c++ program is not faulty, when generating the pipeline from the console you can see the same behavior. Of course i have no time measurement when generating the pipeline in the console but you can see the big latency and images artifacts when the inference time is high.
I experienced a same behavior when using deepstream in a docker on my desktop pc. But it appears much less often. So on my desktop pc the inference time is the most time about 20ms and then rises for a few seconds to a much higher value and immediately back to 20ms.
Specification for the Jetson:
Jetson Xavier Developer Kit
Jetpack 4.4.1 Linux 4.9.140-tegra
Deepstream 5.0
GStreamer version 1.14.5
CUDA version 10.2
Power Mode MAXN
max clock frequency (sudo jetson_clocks)
The pipeline i use:
gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-h264, width=1280, height=720, framerate=30/1 ! h264parse ! avdec_h264 ! nvvideoconvert ! video/x-raw(memory:NVMM), width=1280, height=720, format=NV12, interlace-mode=progressive ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 live-source=1 nvbuf-memory-type=0 ! nvinfer config-file-path=config_infer_primary_yoloV3.txt unique-id=1 ! nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink sync=0
I am using the avdec_h264 decoder because it is faster then the nvvideo4linux2 decoder.
config_infer_primary_yoloV3.txt:
config_infer_primary_yoloV3.txt
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=yolov3.cfg
model-file=yolov3.weights
model-engine-file=model_b1_gpu0_int8.engine
labelfile-path=labels.txt
int8-calib-file=yolov3-calibration.table.trt7.0
##0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
interval=0
##0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all]
nms-iou-treshold=0.3
threshold=0.7
Tegrastats output when running the pipeline:
Tegrastats output
RAM 4982/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [11%@2265,9%@2265,6%@2265,9%@2265,10%@2265,12%@2265,21%@2265,31%@2265] EMC_FREQ 0% GR3D_FREQ 9% AO@36.5C GPU@40.5C Tdiode@39.75C PMIC@100C AUX@36C CPU@38C thermal@38C Tboard@36C GPU 10141/9971 CPU 1997/1949 SOC 3687/3688 CV 0/0 VDDRQ 1229/1108 SYS5V 2817/2815
RAM 4982/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [10%@2265,4%@2265,8%@2265,3%@2265,6%@2265,6%@2265,29%@2265,21%@2265] EMC_FREQ 0% GR3D_FREQ 86% AO@36.5C GPU@40.5C Tdiode@39.5C PMIC@100C AUX@36.5C CPU@38C thermal@38.15C Tboard@37C GPU 10146/9973 CPU 1843/1948 SOC 3689/3688 CV 0/0 VDDRQ 1076/1107 SYS5V 2817/2815
RAM 4982/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [7%@2265,8%@2265,7%@2265,5%@2265,8%@2265,5%@2265,22%@2265,30%@2265] EMC_FREQ 0% GR3D_FREQ 91% AO@36.5C GPU@40.5C Tdiode@39.5C PMIC@100C AUX@36.5C CPU@38C thermal@38.15C Tboard@37C GPU 9838/9971 CPU 1844/1946 SOC 3689/3688 CV 0/0 VDDRQ 1076/1107 SYS5V 2817/2815
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [8%@2265,9%@2265,5%@2265,5%@2265,6%@2265,5%@2265,22%@2265,25%@2265] EMC_FREQ 0% GR3D_FREQ 5% AO@36.5C GPU@40.5C Tdiode@39.5C PMIC@100C AUX@36.5C CPU@38C thermal@38.15C Tboard@37C GPU 9838/9969 CPU 1844/1945 SOC 3689/3688 CV 0/0 VDDRQ 1076/1107 SYS5V 2817/2815
RAM 4982/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [6%@2265,8%@2265,4%@2265,5%@2265,11%@2265,4%@2265,15%@2265,27%@2265] EMC_FREQ 0% GR3D_FREQ 61% AO@36.5C GPU@40.5C Tdiode@39.5C PMIC@100C AUX@36.5C CPU@38C thermal@38.15C Tboard@37C GPU 10146/9972 CPU 1843/1944 SOC 3689/3688 CV 0/0 VDDRQ 1076/1106 SYS5V 2817/2815
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [13%@2265,8%@2265,2%@2265,5%@2265,6%@2265,5%@2265,13%@2265,26%@2265] EMC_FREQ 0% GR3D_FREQ 29% AO@36.5C GPU@41C Tdiode@39.5C PMIC@100C AUX@36.5C CPU@38C thermal@38.3C Tboard@37C GPU 10141/9974 CPU 1843/1942 SOC 3689/3688 CV 0/0 VDDRQ 1076/1106 SYS5V 2817/2815
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [18%@2265,9%@2265,8%@2265,4%@2265,8%@2265,7%@2265,22%@2265,20%@2265] EMC_FREQ 0% GR3D_FREQ 72% AO@37C GPU@41.5C Tdiode@39.75C PMIC@100C AUX@36.5C CPU@38.5C thermal@38.3C Tboard@37C GPU 11668/9996 CPU 2303/1947 SOC 3838/3690 CV 0/0 VDDRQ 1228/1107 SYS5V 2898/2816
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [22%@2265,6%@2265,6%@2265,11%@2265,15%@2265,11%@2265,12%@2265,34%@2265] EMC_FREQ 0% GR3D_FREQ 70% AO@37C GPU@41C Tdiode@40C PMIC@100C AUX@36.5C CPU@38.5C thermal@38.45C Tboard@37C GPU 11668/10017 CPU 2303/1952 SOC 3838/3692 CV 0/0 VDDRQ 1382/1111 SYS5V 2898/2817
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [20%@2265,9%@2265,7%@2265,11%@2265,9%@2265,5%@2265,17%@2265,26%@2265] EMC_FREQ 0% GR3D_FREQ 31% AO@37C GPU@41.5C Tdiode@40C PMIC@100C AUX@36.5C CPU@38C thermal@38.6C Tboard@37C GPU 11822/10040 CPU 2149/1954 SOC 3838/3694 CV 0/0 VDDRQ 1382/1114 SYS5V 2898/2818
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [14%@2265,6%@2265,6%@2265,9%@2265,8%@2265,6%@2265,25%@2265,20%@2265] EMC_FREQ 0% GR3D_FREQ 35% AO@37C GPU@41.5C Tdiode@40C PMIC@100C AUX@36.5C CPU@38.5C thermal@38.6C Tboard@37C GPU 11822/10063 CPU 1995/1955 SOC 3838/3695 CV 0/0 VDDRQ 1382/1118 SYS5V 2898/2819
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [17%@2265,10%@2265,6%@2265,6%@2265,7%@2265,6%@2265,25%@2265,23%@2265] EMC_FREQ 0% GR3D_FREQ 66% AO@37C GPU@41.5C Tdiode@40C PMIC@100C AUX@36.5C CPU@38.5C thermal@38.6C Tboard@37C GPU 11668/10082 CPU 1995/1955 SOC 3838/3697 CV 0/0 VDDRQ 1382/1121 SYS5V 2898/2820
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [16%@2265,8%@2265,5%@2265,4%@2265,7%@2265,5%@2265,36%@2265,16%@2265] EMC_FREQ 0% GR3D_FREQ 42% AO@37C GPU@41.5C Tdiode@40C PMIC@100C AUX@37C CPU@38.5C thermal@38.75C Tboard@37C GPU 11668/10102 CPU 1995/1956 SOC 3838/3699 CV 0/0 VDDRQ 1382/1124 SYS5V 2898/2821
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [21%@2265,7%@2265,5%@2265,10%@2265,12%@2265,5%@2265,32%@2265,11%@2265] EMC_FREQ 0% GR3D_FREQ 44% AO@37C GPU@42C Tdiode@40C PMIC@100C AUX@37C CPU@38.5C thermal@38.8C Tboard@37C GPU 11975/10124 CPU 1995/1956 SOC 3838/3701 CV 0/0 VDDRQ 1382/1127 SYS5V 2898/2822
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [15%@2265,7%@2265,4%@2265,5%@2265,5%@2265,5%@2265,36%@2265,14%@2265] EMC_FREQ 0% GR3D_FREQ 60% AO@37C GPU@41.5C Tdiode@40.25C PMIC@100C AUX@37C CPU@39C thermal@38.8C Tboard@37C GPU 11668/10143 CPU 2149/1958 SOC 3838/3702 CV 0/0 VDDRQ 1228/1128 SYS5V 2898/2823
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [20%@2265,7%@2265,10%@2265,6%@2265,4%@2265,2%@2265,34%@2265,12%@2265] EMC_FREQ 0% GR3D_FREQ 53% AO@37C GPU@41.5C Tdiode@40.25C PMIC@100C AUX@37C CPU@38.5C thermal@38.95C Tboard@37C GPU 11668/10161 CPU 1995/1959 SOC 3838/3704 CV 0/0 VDDRQ 1382/1131 SYS5V 2898/2824
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [18%@2265,12%@2265,5%@2265,7%@2265,6%@2265,7%@2265,23%@2265,23%@2265] EMC_FREQ 0% GR3D_FREQ 12% AO@37C GPU@41.5C Tdiode@40.25C PMIC@100C AUX@37C CPU@38.5C thermal@39.1C Tboard@37C GPU 11361/10175 CPU 1996/1959 SOC 3838/3705 CV 0/0 VDDRQ 1228/1132 SYS5V 2898/2825
RAM 4982/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [15%@2265,6%@2265,8%@2265,5%@2265,9%@2265,4%@2265,32%@2265,16%@2265] EMC_FREQ 0% GR3D_FREQ 66% AO@37.5C GPU@41.5C Tdiode@40C PMIC@100C AUX@37C CPU@38.5C thermal@39.1C Tboard@37C GPU 10295/10176 CPU 1843/1958 SOC 3687/3705 CV 0/0 VDDRQ 1229/1134 SYS5V 2817/2824
RAM 4981/15823MB (lfb 2049x4MB) SWAP 0/7911MB (cached 0MB) CPU [7%@2265,5%@2265,2%@2265,6%@2265,9%@2265,9%@2265,30%@2265,18%@2265] EMC_FREQ 0% GR3D_FREQ 81% AO@37C GPU@41C Tdiode@40.25C PMIC@100C AUX@37C CPU@38.5C thermal@38.65C Tboard@37C GPU 9992/10174 CPU 1998/1958 SOC 3689/3705 CV 0/0 VDDRQ 1076/1133 SYS5V 2817/2824
RAM 4988/15823MB (lfb 2048x4MB) SWAP 0/7911MB (cached 0MB) CPU [6%@2265,6%@2265,4%@2265,7%@2265,9%@2265,2%@2265,29%@2265,20%@2265] EMC_FREQ 0% GR3D_FREQ 13% AO@37C GPU@41C Tdiode@40.25C PMIC@100C AUX@37C CPU@38.5C thermal@38.65C Tboard@37C GPU 9838/10170 CPU 1844/1957 SOC 3689/3705 CV 0/0 VDDRQ 1076/1132 SYS5V 2777/2824
I think its interesting the gpu utilization (GR3D_FREQ) makes such big jumps. It is jumping from 9% to 91% and immediately back to 5%.
Thanks in advance!