Tensorrt Inference in Real time

I had a cnn keras model(.h5) —> to tensorflow (.pb)—> onnx (.onnx).
After this on my Jetson nano which has JetPack 4.6, I ran the following command:

$ /usr/src/tensorrt/bin/trtexec --onnx= —saveEngine=createdEngine.engine

Also in a python script I have the following code:

import cv2
from cvzone.HandTrackingModule import HandDetector
from cvzone.ClassificationModule import Classifier
import numpy as np
import math

final_output = “”
letters =
count_frames = 20

cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
classifier = Classifier(“fall_keras_model.h5”, “fall_labels.txt”)

offset = 50
imgSize = 300
counter = 0

labels = [“A”, “B”, “back”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”,
“N”, “O”, “P”, “Q”, “R”, “S”, “space”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”] #back, space, j, z

while True:
success, img = cap.read()
hands = detector.findHands(img, draw=False)

filtered = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
filtered = cv2.GaussianBlur(filtered, (5, 5), 2)
filtered = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
ret, filtered = cv2.threshold(filtered, 170, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

cv2.imshow("Original", img)

if hands:
	hand = hands[0]
	x, y, w, h = hand['bbox']

	imgWhite = np.ones((imgSize, imgSize), np.uint8)*255
	imgCrop = filtered[y-offset : y+h+offset, x-offset : x+w+offset]
	imgCropShape = imgCrop.shape

	aspectRatio = h/w

		if aspectRatio > 1:
			k = imgSize/h
			wCal = math.ceil(k*w)
			imgResize = cv2.resize(imgCrop, (wCal, imgSize))
			imgResizeShape = imgResize.shape

			wGap = math.ceil((imgSize-wCal)/2)
			imgWhite[:, wGap:wCal+wGap] = imgResize
			gray2rgb = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2RGB)
			k = imgSize / w
			hCal = math.ceil(k * h)
			imgResize = cv2.resize(imgCrop, (imgSize, hCal))
			imgResizeShape = imgResize.shape

			hGap = math.ceil((imgSize - hCal) / 2)
			imgWhite[hGap:hCal + hGap, :] = imgResize
			gray2rgb = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2RGB)

		prediction, index = classifier.getPrediction(gray2rgb)
		count_frames -= 1
		lett = lett.replace(lett, "")

		if count_frames == 0:
			count_frames = 20
			lett = max(letters, key = letters.count)
			if lett == "space":
			    final_output += " "
			elif lett == "back":
			    final_output = final_output[0:len(final_output)-1]
			    final_output += lett

		if (x-offset > 0 and x+offset < img.shape[1]  and  y-offset > 0  and  y+offset < img.shape[0]):
			#cv2.imshow("Filtered", filtered)
			#cv2.imshow("Cropped", imgCrop)
			imgWhite = cv2.putText(imgWhite, final_output, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,255), 2, cv2.LINE_AA)                
			cv2.imshow("Final", imgWhite)

    		print("ERROR: Hand out of frame")

key = cv2.waitKey(1)
if key == ord('q'):

issue: how should I use the tensorrt engine I just built and saved to use it in the above code and display the output?

@WayneWWW @AastaLLL @DaneLLL @dusty_nv
Please help.


Some example can be found in the below wiki:


Hi @AastaLLL,
Thanks for the reply and the examples sadly I couldn’t try if they work because I cannot build Pycuda on Jetson Nano, JetPack 4.6.
I did start a new topic on the tensorrt page.

Thank you,
Dhairya Sachdeva

Okay I built Pycuda-2022.1 for JetPack 4.6 on Jetson Nano. Then I tried running the following script using the .engine I have from previously converting .onnx to .engine via trtexec.

I ran the script with some changes:

import cv2
import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

batch = 1
host_inputs =
cuda_inputs =
host_outputs =
cuda_outputs =
bindings =

def Inference(engine):
image = cv2.imread(“/home/isl/a.jpg”)
image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0
Image = cv2.resize(image, (224,224))

np.copyto(host_inputs[0], image.ravel())
stream = cuda.Stream()
context = engine.create_execution_context()

start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
print("execute times "+str(time.time()-start_time))

output = host_outputs[0]

def PrepareEngine():
with open(‘sample.engine’, ‘rb’) as f:
serialized_engine = f.read()

runtime = trt.Runtime(TRT_LOGGER)
engine = runtime.deserialize_cuda_engine(serialized_engine)

# create buffer
for binding in engine:
    size = trt.volume(engine.get_tensor_shape(binding)) * batch
    host_mem = cuda.pagelocked_empty(shape=[size],dtype=np.float32)
    cuda_mem = cuda.mem_alloc(host_mem.nbytes)

    if engine.get_tensor_mode(binding)==trt.TensorIOMode.INPUT:

return engine

if name == “main”:
engine = PrepareEngine()

engine = []


  1. First of all the expected output was “0” but the output I got “6” or “5”.
  2. It took a long time to load the engine after I ran the command “python3 test.py” (test.py is the script given above)
  3. I tried for various inputs but I always got varying outputs which were not close to the expected output.

Context: the model is a image sign recognition model and I’m giving input image of (.jpg) size 300,300,3.

Thank you,
Dhairya Sachdeva


1. Please check if any change is required for the preprocessing.

image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0

2. Have you maximized the device performance?

sudo nvpmodel -m 0
sudo jetson_clocks

3. Could you try if you can get the expected output with ONNXRuntime?


Hi @AastaLLL,

  1. I removed this line of code, as the image I’m Inferencing the trt engine with is already preprocessed and the dataset I used to train the mode also was trained on some part of this data only.

  2. The device performance has been maximised, still takes the code I shared around 8-11 seconds to just load the engine. Then further is takes 8-9 seconds to get the result for a single image.

  3. Yes, I tried with onnx runtime as well, the outputs were not very good in that aswell, I’m guessing it has something to do with the python api tf2onnx and keras2onnx. Would help a lot if you could suggest a few changes for the conversion process.

  4. Please suggest any change to be made to convert to trt using the trtexex inbuilt in the Jetson nano.

  5. Also the script gives a runtime error: invalid argument passed runtime. It still outputs something and the error is reported after it.


1. Please check Q3.

2. Have you tried to infer the model on a dGPU?
Could you share the inference time?

3. It’s expected that TensorRT output the same result as ONNXRuntime.
If you didn’t get the correct results, it indicates there are some issues when converting the model into ONNX.
For this case, please check it with the tf2onnx team directly.

4. Usually, the TensorRT engine can be generated with trtexec.

$ /usr/src/tensorrt/bin/trtexec --onnx=[file]

5. The script shared above is for TensorRT8.4.
Please check the below change to make it works with TensorRT8.2:


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.