Difference between running the inference with trtexec and tensorrt python API

chakibdace · May 25, 2021, 1:53pm

Hi all,
To reproduce the NVIDIA Jetson AGX Xavier benchmarks of Tiny YOLO V3, i’ve used the sample /usr/src/tensorrt/samples/python/yolov3_onnx but i’ve got perfs between 600 and 700 FPS with this parameters link of my recent post which contains thr details of the implemntation:

Power Mode : MAXN
Input resolution : 416x416
Precision Mode : INT8 (Calibration with 1000 images and IInt8EntropyCalibrator2 interface)
batch = 8
JetPack Version : 4.5.1
TensorRT version : 7.1.3

I’ve also run this command to maximize the perfs

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Then i tried GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark to reproduce the FPS of Tiny YOLO V3 and that worked properly (i got the 1000 FPS), but i noticed that this repos use the shell trtexec and not the python API tensorrt that i’am used to use by (import tensorrt as trt)

My question is what’s the difference between this two implementations ( trtexec VS tensorrt python API) when we generate/build the TensorRT engine and when we run the inference ?

Thanks

AastaLLL · May 26, 2021, 3:44am

Hi,

Based on the data below, please set workspace to 2048Mib and inference it with batchsize=16:

github.com

NVIDIA-AI-IOT/jetson_benchmarks/blob/master/benchmark_csv/xavier-benchmarks.csv

ModelName,FrameWork,Devices,BatchSizeGPU,BatchSizeDLA,WS_GPU,WS_DLA,input,output,URL
inception_v4,caffe,3,4,1,2048,2048,NA,prob,https://www.dropbox.com/s/b7masj8xdoycv2w/inception_v4.prototxt
vgg19_N2,caffe,1,4,0,2048,0,NA,prob,https://www.dropbox.com/s/t4qq079g5q4jibx/vgg19_N2.prototxt
super_resolution_bsd500,onnx,1,4,0,2048,2048,NA,NA,https://www.dropbox.com/s/hdhxndo23cm9i5y/super_resolution_bsd500.zip
unet-segmentation,tensorrt,1,2,0,2048,None,"input_1,1,512,512",conv2d_19/Sigmoid,https://www.dropbox.com/s/85lttamnbjeig0e/unet-segmentation.uff
pose_estimation,caffe,1,4,0,2048,None,NA,Mconv7_stage2_L2,https://www.dropbox.com/s/hwa5i14v67u57ij/pose_estimation.prototxt
yolov3-tiny-416,onnx,1,16,0,2048,2048,NA,NA,https://www.dropbox.com/s/ck9e40b57rd5o14/yolov3-tiny-416.zip
ResNet50_224x224,caffe,3,16,4,2048,2048,NA,prob,https://www.dropbox.com/s/9ohk387v0ki56wx/ResNet50_224x224.prototxt
ssd-mobilenet-v1,onnx,3,16,2,2048,2048,NA,NA,https://www.dropbox.com/s/gx5zayt76vszhpo/ssd-mobilenet-v1.zip

We are modifying onnx_to_tensorrt.py for YOLOv3 Tiny as an example.
Will share with you later.

Thanks

AastaLLL · May 27, 2021, 6:07am

Hi,

We reuse the calibration cache from here and the model used in jetson-benchmark.
Be able to get 1088 fps on INT8+Batchsize16+Workspace2048.

Please check following for the instructions:

$ wget https://www.dropbox.com/s/ck9e40b57rd5o14/yolov3-tiny-416.zip
$ unzip yolov3-tiny-416.zip
$ wget https://raw.githubusercontent.com/jkjung-avt/tensorrt_demos/master/yolo/calib_cache/calib_yolov3-tiny-int8-416.bin

YOLOv3_Tiny_benchmark.patch (8.7 KB)

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov3-tiny-416-bs16.onnx --best --workspace=2048 --saveEngine=yolov3-tiny-416-bs16.trt --calib=calib_yolov3-tiny-int8-416.bin
$ git apply YOLOv3_Tiny_benchmark.patch
$ python3 onnx_to_tensorrt.py

$ python3 onnx_to_tensorrt.py
Reading engine from file yolov3-tiny-416-bs16.trt
Running inference on image dog.jpg…
FPS: 1088.407443846851
[[125.69800719 217.86413197 254.43353728 296.92521182]
[475.38425396 79.1671842 194.08492385 86.92553251]] [0.80192175 0.70910078] [16 2]
…

Thanks.

chakibdace · May 27, 2021, 12:43pm

Hi @AastaLLL,

Thank you for your help, i understand now what i was doing wrong !

Topic		Replies	Views
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1189	August 27, 2020
TensorRT Inconsistent Inference Performance with Python and Trtexec TensorRT tensorrt , cuda , jetson-inference , python , cudnn	0	288	April 2, 2024
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3528	October 23, 2020
Huge speed difference between engines built from scratch and engines built from onnx Jetson AGX Xavier tensorrt , nvbugs	11	846	August 3, 2021
Convert YOLOv7 QAT model to TensorRT engine failure Jetson AGX Xavier yolo	9	1027	June 21, 2023
Python sample yolov3 app on tensorrt Jetson Xavier NX tensorrt , yolo , python	9	1685	October 18, 2021
How to calculate TOPS (INT8) or TFLOPS (FP16) of each layer of a CNN using TensorRT Jetson AGX Xavier tensorrt	7	10711	September 12, 2021
About trtexec Jetson Nano tensorrt	2	3355	October 15, 2021
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1691	September 30, 2021
The same model produces different results in TensorRT8 and TensorRT7 Jetson Xavier NX tensorrt	6	407	November 22, 2022

Difference between running the inference with trtexec and tensorrt python API

My question is what’s the difference between this two implementations ( trtexec VS tensorrt python API) when we generate/build the TensorRT engine and when we run the inference ?

Related topics