I am using Jetson Xavier (32GB) to run inference on a CRNN model.The model can be found here
I am using fp32 mode for inference.The results are accurate when compared to pytorch inference but the inference time varies drastically to process the same batch multiple times.
Setup -
Folder demo2 consists of 500 images on which inference will be run in batches of 32.crnn.trt
is the tensorrt file generated from onnx model(crnn.onnx
) that was exported from above mentioned repository from pytorch.
Code -
import sys
from torch.utils.data import DataLoader
import time
//standard inference module shipped with jetson
from inference import allocate_buffers,get_engine,do_inference
import pickle
import cv2
import numpy as np
import dataset_trt
from PIL import Image
from dataset_trt import LoadImages,Pad
data = {}
` //Loads images after resizing in batches of 32 without shuffle
detectloader = DataLoader(LoadImages(transform=Pad(100, 32, 'whole'), image_files_dir='ocr_recognition/data/demo2/'),`
batch_size=32,shuffle=False)
engine = get_engine('ocr_recognition/crnn.onnx', 'ocr_recognition/crnn.trt')
//inference is run 3 times over 500 images(i.e. 3 epochs)
for i in range(3):
print(i)
for filename,image in detectloader:
image = image.numpy()
with engine.create_execution_context() as context:
inputs,outputs,bindings,stream = allocate_buffers(engine)
inputs[0].host = image
t1 = time.time()
output = do_inference(context,bindings,inputs,outputs,stream)
print(time.time() - t1)
Output of above print statement(time in sec)
0
0.903141975402832
0.11624932289123535
0.1136171817779541
0.09180784225463867
0.09296345710754395
0.09299278259277344
0.08098387718200684
0.05849766731262207
0.0542445182800293
0.04758453369140625
0.047617435455322266
0.04759478569030762
0.04703259468078613
0.04756903648376465
0.047551631927490234
0.04757833480834961
0.04760384559631348
0.04701590538024902
0.04694771766662598
1
0.047575950622558594
0.04764819145202637
0.04700660705566406
0.047077178955078125
0.04758882522583008
0.04157447814941406
0.041478633880615234
0.04138970375061035
0.04059720039367676
0.04153728485107422
0.04144930839538574
0.04150533676147461
0.041487932205200195
0.04096221923828125
0.04104185104370117
0.041502952575683594
0.040570974349975586
0.041085004806518555
0.04155445098876953
2
0.04109358787536621
0.04118943214416504
0.04118227958679199
0.03726077079772949
0.03625345230102539
0.035823822021484375
0.03543353080749512
0.036386966705322266
0.03538227081298828
0.03543853759765625
0.03527665138244629
0.03669238090515137
0.03589630126953125
0.036310672760009766
0.03545975685119629
0.03621697425842285
0.03635263442993164
0.03638172149658203
0.03630399703979492
As you can see times in the first epoch are drastically higher.Same batches across epochs take different times.Moreover these times are largely depending upon runs.Getting different times for different runs(ranging between 150ms to 35ms) for same batch.Isn’t this pretty odd?Am I screwing up somewhere?