TensorRT + YOLOv3 performance issue

sungwonida · May 31, 2019, 1:48am

Hi, I’m working on some object detection models, now especially, YOLOv3, and I’d like to get a reasonably well-working object detection system on some embedded platforms like TX2 or Xavier.

In order to do so, I examined a TensorFlow version of YOLOv3 and a TensorRT version of YOLOv3 each.

The TensorFlow model(pb) runs under tensorflow==1.13.1(Nvidia official) with JetPack 4.2
The TensorRT engine has been generated in the process of 'Darknet checkpoint - ONNX model - TensorRT engine' and runs under tensorrt==5.0.6.3 with JetPack 4.2

Time profiling has been made on only the network forwarding section of each.
All the other processes like the preprocessing the input and the postprocessing of getting the bounding boxes are excluded from the profiling.

The settings above have been tested on TX2 and Xavier, and now, I’ve got the table below.
https://docs.google.com/spreadsheets/d/1IcSnF9a3SdczWmvvHNcuPGXMDanu8q7axFJSiuAQFLo/edit#gid=0&range=B2:E6
The numerics in the table are in the millisecond and they have been gotten by testing two times and then by averaging.

So, my questions are twofold.
The first one is about the ideas on dealing with the counter-intuitive results on TX2(MAXP_CORE_ALL) + TensorRT and Xavier(MAXN) + TensorRT, the red colored ones.
The second one is about the ideas on getting more performance improvements on TX2(MAXN) + TensorRT.

Any comments would be appreciated.

sungwonida · June 3, 2019, 1:40am

There was an option that I missed, jetson_clocks.
That turned every record to the expected range which is intuitive. Please refer to the sheet below.
[url]https://docs.google.com/spreadsheets/d/1IcSnF9a3SdczWmvvHNcuPGXMDanu8q7axFJSiuAQFLo/edit?pli=1#gid=1026068223&range=B2[/url]

But, still, wonder why setting the Power Model(nvpmodel) to MAXN without using jetson_clocks produces awkward results as shown in the first sheet of the link above.

r7vme · June 13, 2019, 8:30pm

Hi @sungwonida , i’m using TensorRT-Yolov3 from lewes6369 (Caffe based) and was able to get 21ms on Xavier in MAXN mode (even w/o jetson_clocks) with FP16 precision 416px resolution.

However i also noticed that with jetson_clocks it works faster on first image, but when i run continuously in the same session i’m getting 21ms evne w/o jetson_clocks.

Topic		Replies	Views
Nvidia Jetson NX extremely slow even with TensorRT inference for yolov3 TensorRT	3	1202	August 23, 2021
TensorFlow-Yolov3 to ONNX to trt engine TensorRT tensorrt , tensorflow , yolo , onnx	5	1470	March 26, 2021
tensorflow-gpu not using gpu? Jetson TX2	4	4285	October 18, 2021
Launching the TensorRT model on Jetson TX2 TensorRT	3	995	December 25, 2019
Nvidia Jetson NX extremely slow even with TensorRT inference for yolov3 Jetson Xavier NX tensorrt	21	2552	October 18, 2021
Why there is no difference in performance between tx2 and xavier?(Deep Learning Speed) Jetson AGX Xavier	9	1782	November 9, 2018
SSD-MobilenetV2 bad performance on XavierNX using Tensorflow + TF_TRT Jetson Xavier NX tensorrt , opencv , cuda , tensorflow	5	1860	October 18, 2021
tensorflow mobilenet object detection model in Tx2 is very slow? Jetson TX2	11	3966	October 18, 2021
Optimize Inference Time of yolov2 model on Jetson Nano NX Jetson Xavier NX tensorrt , tensorflow	2	745	June 15, 2022
TensorRT 3 for tensorflow support Jetson TX2	8	1603	October 18, 2021

TensorRT + YOLOv3 performance issue

Related topics