Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model


I am getting output like below while running inference on LPRnet instead of numbers.

input: shape:(-1, 3, 48, 96) dtype:DataType.FLOAT
output: shape:(-1, 24) dtype:DataType.INT32
output: shape:(-1, 24) dtype:DataType.FLOAT
[array([35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
       35, 35, 35, 35, 35, 35, 35], dtype=int32), array([0.9999987 , 0.99999976, 1.        , 1.        , 1.        ,
       0.9999999 , 0.9999999 , 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.99999976, 0.99999976, 0.99999976,
       0.99999964, 0.9999993 , 0.9999987 , 0.99999964], dtype=float32)]

My Code-base is below :

import os
import time

import cv2
#import matplotlib.pyplot as plt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
from PIL import Image
import pdb

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem): = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str( + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def load_engine(trt_runtime, engine_path):
    with open(engine_path, "rb") as f:
        engine_data =
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine, batch_size=-1):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        # pdb.set_trace()
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"input: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
            outputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"output: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
    return inputs, outputs, bindings, stream

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]
    # Run inference.
        batch_size=batch_size, bindings=bindings, stream_handle=stream.handle
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(, out.device, stream) for out in outputs]
    # Synchronize the stream
    # Return only the host outputs.
    return [ for out in outputs]

# TensorRT logger singleton
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_engine_path = "number_plate_classification_b8_fp16.engine"

trt_runtime = trt.Runtime(TRT_LOGGER)
# pdb.set_trace()
trt_engine = load_engine(trt_runtime, trt_engine_path)
# Execution context is needed for inference
context = trt_engine.create_execution_context()
# This allocates memory for network inputs/outputs on both CPU and GPU
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)

# pdb.set_trace()
image = cv2.imread("1626673361593_cropped_batch_code_image_imgGB3_BATOO69_.jpg")
image = cv2.resize(image, (96, 48))/255.0

image = image.T

np.copyto(inputs[0].host, image.ravel())

input_shape = (1,3,48,96)
context.set_binding_shape(0, input_shape)

output = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

Please help me what should I add to get exact inference result.

Sorry if the problem is too basic.


Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:


Hi @NVES ,

Thanks for the response.

I had shared the script too. For the LPRnet FP16 engine I am getting result in DeepStream but when I run it with python-opencv code as mentioned above I am not getting expected labels.

output = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

How Should I process output further to get the result ?

also In the document of LPRnet mentioned that " (DeepStream post-process plugin is needed to get the final license plate)" so how should I add that plugin in my custom code.

Please help me out.
Note : I have already run the complete LPD and LPR in deepstream application but getting issues in custom implementation.



please help me in resolving this issue.



Looks like this is related to TLT. We recommend you to please post your concern on TLT forum to get better help.

Thank you.

Duplicated with Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model - Intelligent Video Analytics / Transfer Learning Toolkit - NVIDIA Developer Forums