Implementing custom .engine to python api

I’ve been strugling for the past couple of weeks, to train, export and convert a model from the Transfer Learning Toolkit tutorial.
I was finally able to do that, and use the .engine with the sample csi .txt in deepstream.

What i really need help with, is how to use my .engine on the jetson nano, in python/c++ code so that i can do more things with my detections.

what i want to do:
Detect an object from csi/usb live video feed.
Cut that part of the frame where the detected object is and parse it somewhere else/save it as a

What platform i want to do this on/ am currently on:
Jetson Nano.
Jetpack 4.2.1
TensorRT 5
CSI/USB camera live feed

Any help/guidance would be greatly appreciated.
Thanks in advance!


Please refer to below sample link:


I started implementing this file, however without the engine creation as I already created one with ./tlt-converter:

I’ve converted it to look like this atm:
import os
import cv2
import sys
import time
import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda


TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, ‘’)
runtime = trt.Runtime(TRT_LOGGER)

create engine

with open(‘resnet10_fp16_0_5.engine’, ‘rb’) as f:
buf =
engine = runtime.deserialize_cuda_engine(buf)

create buffer

host_inputs =
cuda_inputs =
host_outputs =
cuda_outputs =
bindings =
stream = cuda.Stream()

for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
host_mem = cuda.pagelocked_empty(size, np.float16)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)

if engine.binding_is_input(binding):

context = engine.create_execution_context()


ori = cv2.imread(sys.argv[1])
image = cv2.cvtColor(ori, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (720,1280))
image = (2.0/255.0) * image - 1.0
image = image.transpose((2, 0, 1))
np.copyto(host_inputs[0], image.ravel())

and when i run: python3 image.jpg

i get this:
[TensorRT] INFO: Glob Size is 2190768 bytes.
[TensorRT] INFO: Added linear block of size 176947200
[TensorRT] INFO: Added linear block of size 117964800
[TensorRT] INFO: Added linear block of size 44236800
[TensorRT] INFO: Deserialize required 2796280 microseconds.
Traceback (most recent call last):
File “”, line 49, in
np.copyto(host_inputs[0], image.ravel())
File “<array_function internals>”, line 6, in copyto
ValueError: could not broadcast input array from shape (2764800) into shape (44236800)

I was actually able to figure out how to fix this, because when i was exporting the .engine with ./tlt-converter i had not set a max_batch_size, i was assigning the whole memory of the nano, which was much greater than that of one picture, after changing:
engine.max_batch_size to 1, I no longer have the same issues from the shape.

I’m still struggling with making it work with the csi camera, but thanks for the examples