Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model

Pritam · July 20, 2021, 3:05pm

Hi,

I am getting output like below while running inference on LPRnet instead of numbers.

input: shape:(-1, 3, 48, 96) dtype:DataType.FLOAT
output: shape:(-1, 24) dtype:DataType.INT32
output: shape:(-1, 24) dtype:DataType.FLOAT
[array([35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
       35, 35, 35, 35, 35, 35, 35], dtype=int32), array([0.9999987 , 0.99999976, 1.        , 1.        , 1.        ,
       0.9999999 , 0.9999999 , 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.99999976, 0.99999976, 0.99999976,
       0.99999964, 0.9999993 , 0.9999987 , 0.99999964], dtype=float32)]

My Code-base is below :

import os
import time

import cv2
#import matplotlib.pyplot as plt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
from PIL import Image
import pdb


class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()


def load_engine(trt_runtime, engine_path):
    with open(engine_path, "rb") as f:
        engine_data = f.read()
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine, batch_size=-1):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        # pdb.set_trace()
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"input: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"output: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
    return inputs, outputs, bindings, stream



def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(
        batch_size=batch_size, bindings=bindings, stream_handle=stream.handle
    )
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

# TensorRT logger singleton
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_engine_path = "number_plate_classification_b8_fp16.engine"

trt_runtime = trt.Runtime(TRT_LOGGER)
# pdb.set_trace()
trt_engine = load_engine(trt_runtime, trt_engine_path)
# Execution context is needed for inference
context = trt_engine.create_execution_context()
# This allocates memory for network inputs/outputs on both CPU and GPU
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)

# pdb.set_trace()
image = cv2.imread("1626673361593_cropped_batch_code_image_imgGB3_BATOO69_.jpg")
image = cv2.resize(image, (96, 48))/255.0

image = image.T

np.copyto(inputs[0].host, image.ravel())

input_shape = (1,3,48,96)
context.set_binding_shape(0, input_shape)

output = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
print(output)

Please help me what should I add to get exact inference result.

Sorry if the problem is too basic.

Thanks.

NVES · July 20, 2021, 3:08pm

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

Pritam · July 21, 2021, 5:09am

Hi @NVES ,

Thanks for the response.

I had shared the script too. For the LPRnet FP16 engine I am getting result in DeepStream but when I run it with python-opencv code as mentioned above I am not getting expected labels.

output = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
print(output)

How Should I process output further to get the result ?

also In the document of LPRnet mentioned that " (DeepStream post-process plugin is needed to get the final license plate)" so how should I add that plugin in my custom code.

Please help me out.
Note : I have already run the complete LPD and LPR in deepstream application but getting issues in custom implementation.

Thanks.

Pritam · July 21, 2021, 2:32pm

Hi,

please help me in resolving this issue.

Thanks.

spolisetty · July 21, 2021, 2:48pm

@Pritam,

Looks like this is related to TLT. We recommend you to please post your concern on TLT forum to get better help.

Thank you.

kayccc · July 22, 2021, 5:33am

Duplicated with Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model - Intelligent Video Analytics / Transfer Learning Toolkit - NVIDIA Developer Forums