Tensorrt Inference in Real time

dhairyasachdeva11 · March 11, 2023, 4:14pm

Hi,
I had a cnn keras model(.h5) —> to tensorflow (.pb)—> onnx (.onnx).
After this on my Jetson nano which has JetPack 4.6, I ran the following command:

$ /usr/src/tensorrt/bin/trtexec --onnx= —saveEngine=createdEngine.engine

Also in a python script I have the following code:

import cv2
from cvzone.HandTrackingModule import HandDetector
from cvzone.ClassificationModule import Classifier
import numpy as np
import math

final_output = “”
letters =
count_frames = 20

cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
classifier = Classifier(“fall_keras_model.h5”, “fall_labels.txt”)

offset = 50
imgSize = 300
counter = 0

labels = [“A”, “B”, “back”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”,
“N”, “O”, “P”, “Q”, “R”, “S”, “space”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”] #back, space, j, z

while True:
success, img = cap.read()
hands = detector.findHands(img, draw=False)

filtered = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
filtered = cv2.GaussianBlur(filtered, (5, 5), 2)
filtered = cv2.adaptiveThreshold(filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
ret, filtered = cv2.threshold(filtered, 170, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

cv2.imshow("Original", img)


if hands:
	hand = hands[0]
	x, y, w, h = hand['bbox']

	imgWhite = np.ones((imgSize, imgSize), np.uint8)*255
	imgCrop = filtered[y-offset : y+h+offset, x-offset : x+w+offset]
	imgCropShape = imgCrop.shape

	aspectRatio = h/w

	try:
		if aspectRatio > 1:
			k = imgSize/h
			wCal = math.ceil(k*w)
			imgResize = cv2.resize(imgCrop, (wCal, imgSize))
			imgResizeShape = imgResize.shape

			wGap = math.ceil((imgSize-wCal)/2)
			imgWhite[:, wGap:wCal+wGap] = imgResize
			gray2rgb = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2RGB)
	    	
	    	else:
			k = imgSize / w
			hCal = math.ceil(k * h)
			imgResize = cv2.resize(imgCrop, (imgSize, hCal))
			imgResizeShape = imgResize.shape

			hGap = math.ceil((imgSize - hCal) / 2)
			imgWhite[hGap:hCal + hGap, :] = imgResize
			gray2rgb = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2RGB)

		prediction, index = classifier.getPrediction(gray2rgb)
        
	    	#print(labels[index])
	      
		count_frames -= 1
		lett = lett.replace(lett, "")

		if count_frames == 0:
			count_frames = 20
			lett = max(letters, key = letters.count)
			letters.clear()
			if lett == "space":
			    final_output += " "
			elif lett == "back":
			    final_output = final_output[0:len(final_output)-1]
			else:
			    final_output += lett
		else:
			letters.append(labels[index])
		#print(prediction)

		if (x-offset > 0 and x+offset < img.shape[1]  and  y-offset > 0  and  y+offset < img.shape[0]):
			#cv2.imshow("Filtered", filtered)
			#cv2.imshow("Cropped", imgCrop)
			imgWhite = cv2.putText(imgWhite, final_output, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,255), 2, cv2.LINE_AA)                
			cv2.imshow("Final", imgWhite)

	except:
    		print("ERROR: Hand out of frame")

key = cv2.waitKey(1)
if key == ord('q'):
	cap.release()
	cv2.destroyAllWindows()

issue: how should I use the tensorrt engine I just built and saved to use it in the above code and display the output?

dhairyasachdeva11 · March 12, 2023, 4:17pm

@WayneWWW @AastaLLL @DaneLLL @dusty_nv
@AakankshaS
Please help.

AastaLLL · March 13, 2023, 6:35am

Hi,

Some example can be found in the below wiki:
https://elinux.org/Jetson/L4T/TRT_Customized_Example#OpenCV_with_PLAN_model

Thanks.

dhairyasachdeva11 · March 13, 2023, 1:46pm

Hi @AastaLLL,
Thanks for the reply and the examples sadly I couldn’t try if they work because I cannot build Pycuda on Jetson Nano, JetPack 4.6.
I did start a new topic on the tensorrt page.

Thank you,
Dhairya Sachdeva

dhairyasachdeva11 · March 13, 2023, 8:32pm

@AastaLLL,
Okay I built Pycuda-2022.1 for JetPack 4.6 on Jetson Nano. Then I tried running the following script using the .engine I have from previously converting .onnx to .engine via trtexec.

I ran the script with some changes:

import cv2
import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

batch = 1
host_inputs =
cuda_inputs =
host_outputs =
cuda_outputs =
bindings =

def Inference(engine):
image = cv2.imread(“/home/isl/a.jpg”)
image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0
Image = cv2.resize(image, (224,224))

np.copyto(host_inputs[0], image.ravel())
stream = cuda.Stream()
context = engine.create_execution_context()

start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_v2(bindings)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
stream.synchronize()
print("execute times "+str(time.time()-start_time))

output = host_outputs[0]
print(np.argmax(output))

def PrepareEngine():
with open(‘sample.engine’, ‘rb’) as f:
serialized_engine = f.read()

runtime = trt.Runtime(TRT_LOGGER)
engine = runtime.deserialize_cuda_engine(serialized_engine)

# create buffer
for binding in engine:
    size = trt.volume(engine.get_tensor_shape(binding)) * batch
    host_mem = cuda.pagelocked_empty(shape=[size],dtype=np.float32)
    cuda_mem = cuda.mem_alloc(host_mem.nbytes)

    bindings.append(int(cuda_mem))
    if engine.get_tensor_mode(binding)==trt.TensorIOMode.INPUT:
        host_inputs.append(host_mem)
        cuda_inputs.append(cuda_mem)
    else:
        host_outputs.append(host_mem)
        cuda_outputs.append(cuda_mem)

return engine

if name == “main”:
engine = PrepareEngine()
Inference(engine)

engine = []

ISSUES:

First of all the expected output was “0” but the output I got “6” or “5”.
It took a long time to load the engine after I ran the command “python3 test.py” (test.py is the script given above)
I tried for various inputs but I always got varying outputs which were not close to the expected output.

Context: the model is a image sign recognition model and I’m giving input image of (.jpg) size 300,300,3.

Thank you,
Dhairya Sachdeva

AastaLLL · March 14, 2023, 4:31am

Hi,

1. Please check if any change is required for the preprocessing.

image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0

2. Have you maximized the device performance?

sudo nvpmodel -m 0
sudo jetson_clocks

3. Could you try if you can get the expected output with ONNXRuntime?

Thanks.

dhairyasachdeva11 · March 14, 2023, 12:14pm

Hi @AastaLLL,

I removed this line of code, as the image I’m Inferencing the trt engine with is already preprocessed and the dataset I used to train the mode also was trained on some part of this data only.
The device performance has been maximised, still takes the code I shared around 8-11 seconds to just load the engine. Then further is takes 8-9 seconds to get the result for a single image.
Yes, I tried with onnx runtime as well, the outputs were not very good in that aswell, I’m guessing it has something to do with the python api tf2onnx and keras2onnx. Would help a lot if you could suggest a few changes for the conversion process.
Please suggest any change to be made to convert to trt using the trtexex inbuilt in the Jetson nano.
Also the script gives a runtime error: invalid argument passed runtime. It still outputs something and the error is reported after it.

AastaLLL · March 23, 2023, 4:44am

Hi,

1. Please check Q3.

2. Have you tried to infer the model on a dGPU?
Could you share the inference time?

3. It’s expected that TensorRT output the same result as ONNXRuntime.
If you didn’t get the correct results, it indicates there are some issues when converting the model into ONNX.
For this case, please check it with the tf2onnx team directly.

4. Usually, the TensorRT engine can be generated with trtexec.

$ /usr/src/tensorrt/bin/trtexec --onnx=[file]

5. The script shared above is for TensorRT8.4.
Please check the below change to make it works with TensorRT8.2:

Thanks.

system · April 12, 2023, 6:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tensorrt inference in real time TensorRT tensorrt , python	1	653	March 13, 2023
How to use .trt file for inference on jetson nano Jetson Nano tensorrt	4	1607	October 18, 2021
Using the tensorrt model in python Jetson Nano tensorrt , python	13	2018	April 13, 2022
How to infer using tensorRT on jetson nano? Jetson Nano tensorrt , deep-learning	4	1113	October 15, 2021
Running inference on tensorrt engine on jetson nano Jetson Nano tensorrt , pytorch , onnx	2	459	February 26, 2024
Tensorrt considerably slower than TFRT and pure Tensorflow Jetson Nano tensorrt , tensorflow	3	587	October 15, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3976	April 20, 2022
Can't execute TRT engine on Jetson Nano Jetson Nano tensorrt , onnx , tf-trt	8	1935	May 24, 2022
TRT engine returns nan on jetson nano Jetson Nano tensorrt	7	653	January 31, 2023
TensorRT Jetson Nano ONNX Inference TensorRT	1	584	August 25, 2020

Tensorrt Inference in Real time

Related topics