Hi, I’m working on some object detection models, now especially, YOLOv3, and I’d like to get a reasonably well-working object detection system on some embedded platforms like TX2 or Xavier.
In order to do so, I examined a TensorFlow version of YOLOv3 and a TensorRT version of YOLOv3 each.
- The TensorFlow model(pb) runs under tensorflow==1.13.1(Nvidia official) with JetPack 4.2
- The TensorRT engine has been generated in the process of 'Darknet checkpoint - ONNX model - TensorRT engine' and runs under tensorrt==184.108.40.206 with JetPack 4.2
Time profiling has been made on only the network forwarding section of each.
All the other processes like the preprocessing the input and the postprocessing of getting the bounding boxes are excluded from the profiling.
The settings above have been tested on TX2 and Xavier, and now, I’ve got the table below.
The numerics in the table are in the millisecond and they have been gotten by testing two times and then by averaging.
So, my questions are twofold.
The first one is about the ideas on dealing with the counter-intuitive results on TX2(MAXP_CORE_ALL) + TensorRT and Xavier(MAXN) + TensorRT, the red colored ones.
The second one is about the ideas on getting more performance improvements on TX2(MAXN) + TensorRT.
Any comments would be appreciated.