Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model

Pritam · July 22, 2021, 5:08am

Hi,

I am getting output like below while running inference on LPRnet instead of numbers.

input: shape:(-1, 3, 48, 96) dtype:DataType.FLOAT
output: shape:(-1, 24) dtype:DataType.INT32
output: shape:(-1, 24) dtype:DataType.FLOAT
[array([35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
       35, 35, 35, 35, 35, 35, 35], dtype=int32), array([0.9999987 , 0.99999976, 1.        , 1.        , 1.        ,
       0.9999999 , 0.9999999 , 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.99999976, 0.99999976, 0.99999976,
       0.99999964, 0.9999993 , 0.9999987 , 0.99999964], dtype=float32)]

My Code-base is below :

import os
import time

import cv2
#import matplotlib.pyplot as plt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
from PIL import Image
import pdb


class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()


def load_engine(trt_runtime, engine_path):
    with open(engine_path, "rb") as f:
        engine_data = f.read()
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine, batch_size=-1):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        # pdb.set_trace()
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"input: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
            print(f"output: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
    return inputs, outputs, bindings, stream



def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(
        batch_size=batch_size, bindings=bindings, stream_handle=stream.handle
    )
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

# TensorRT logger singleton
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_engine_path = "number_plate_classification_b8_fp16.engine"

trt_runtime = trt.Runtime(TRT_LOGGER)
# pdb.set_trace()
trt_engine = load_engine(trt_runtime, trt_engine_path)
# Execution context is needed for inference
context = trt_engine.create_execution_context()
# This allocates memory for network inputs/outputs on both CPU and GPU
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)

# pdb.set_trace()
image = cv2.imread("1626673361593_cropped_batch_code_image_imgGB3_BATOO69_.jpg")
image = cv2.resize(image, (96, 48))/255.0

image = image.T

np.copyto(inputs[0].host, image.ravel())

input_shape = (1,3,48,96)
context.set_binding_shape(0, input_shape)

output = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
print(output)

Please help me what should I add to get exact inference result.

Thanks.

Morganh · July 22, 2021, 7:52am

Can you run default inference method and get the correct result?
See

  tlt lprnet inference -m <model>
                   -i <in_image_path>
                   -e <experiment_spec>
                   [-k <key>]
                   [--gpu_index <gpu_index>]
                   [--log_file <log_file>]
                   [--trt]

Pritam · July 22, 2021, 7:55am

Thanks @Morganh for the response.

Actually I have successfully run this model in deep-stream application and getting result. But now I want to use NPR model with custom code (python opencv and pycuda) but with custom code I am getting issue mentioned above.

How can I get exact result out of do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

input: shape:(-1, 3, 48, 96) dtype:DataType.FLOAT
output: shape:(-1, 24) dtype:DataType.INT32
output: shape:(-1, 24) dtype:DataType.FLOAT
[array([35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
       35, 35, 35, 35, 35, 35, 35], dtype=int32), array([0.9999987 , 0.99999976, 1.        , 1.        , 1.        ,
       0.9999999 , 0.9999999 , 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.9999999 , 0.9999999 , 0.99999976,
       0.99999976, 0.99999976, 0.99999976, 0.99999976, 0.99999976,
       0.99999964, 0.9999993 , 0.9999987 , 0.99999964], dtype=float32)]

Morganh · July 22, 2021, 8:05am

Can you modify

image = cv2.resize(image, (96, 48))/255.0
image = image.T

to

image = np.array([(cv2.resize(img, ( 96 , 48 )))/ 255.0 for img in image], dtype=np.float32)

image= image.transpose( 0 , 3 , 1 , 2 )

Pritam · July 22, 2021, 8:22am

Yes I have made this changes but getting Error:

image= image.transpose( 0 , 3 , 1 , 2 )
ValueError: axes don’t match array

Morganh · July 22, 2021, 8:28am

I still recommend you run tlt lprnet inference --trt xxx against the engine number_plate_classification_b8_fp16.engine to check if it can work.

Pritam · July 22, 2021, 8:31am

Okay,

But same engine file is working with the deep-stream application.

Morganh · July 22, 2021, 8:32am

Seems it is for batch 8.

Pritam · July 22, 2021, 8:35am

Yes,

It is for batch 8. So you mean I should generate engine for b1 and then retry ?

tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etltunpruned.etlt -t fp16 -e /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPR/lpr_us_onnx_b16.engine

Morganh · July 22, 2021, 8:35am

Please follow GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 \
           models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

Pritam · July 22, 2021, 8:51am

With Batch-Size 1 :

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,1x3x48x96,1x3x48x96 \
           models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

I am getting Error :

Traceback (most recent call last):
  File "inference_trt_npr.py", line 84, in <module>
    inputs, outputs, bindings, stream = allocate_buffers(trt_engine)
  File "inference_trt_npr.py", line 44, in allocate_buffers
    host_mem = cuda.pagelocked_empty(size, dtype)
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

with batch-size 16 :

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 \
           models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

I am getting error:

Traceback (most recent call last):
  File "inference_trt_npr.py", line 92, in <module>
    image= image.transpose( 0 , 3 , 1 , 2 )
ValueError: axes don't match array

I have read that there is a plugin needed in case of NPR we need extra plugin.
characters id sequence. (DeepStream post-process plugin is needed to get the final license plate)

So how we can use this in custom code base.? and the issue is because I did not add plugin or something else.

Thanks.

Morganh · July 22, 2021, 8:56am

Where did you run below command?

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,1x3x48x96,1x3x48x96 \
           models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

And where did you run your standalone python script for inference?

Pritam · July 22, 2021, 8:59am

I run it from where my tlt-converter file was and in python script I gave the full path of engine generated through the command.

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,1x3x48x96,1x3x48x96 \
           models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

Morganh · July 22, 2021, 9:00am

Sorry, I mean which device did you run

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,1x3x48x96,1x3x48x96
models/LP/LPR/us_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_us_onnx_b16.engine

Pritam · July 22, 2021, 9:01am

I am using Jetson NX-Xavier and run it on NX-Xavier itself.

Morganh · July 22, 2021, 9:02am

Thanks. And then you run the inference python script in xavier too, right?

Pritam · July 22, 2021, 9:02am

Yes. I am doing all this process on NX-Xavier.

Morganh · July 22, 2021, 9:11am

How did you download “tlt-converter” ? Did you download the correct version?

Pritam · July 22, 2021, 9:14am

Yes I downloaded the correct one.

I am getting the inference result in deepstream application with the same engine file with batch-size-8 ,16 but the same engine file is giving result mentioned above in custom python code.

The Problem is with engine file ?

or what correction should I do in my custom code to get inference ?

Morganh · July 22, 2021, 9:46am

Please modify

image = cv2.imread(“1626673361593_cropped_batch_code_image_imgGB3_BATOO69_.jpg”)

to

image = [cv2.imread(“1626673361593_cropped_batch_code_image_imgGB3_BATOO69_.jpg”)]

Topic		Replies	Views
Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model TAO Toolkit	6	693	July 22, 2021
Running nvidia pretrained models in Tensorrt inference TAO Toolkit	14	933	October 6, 2022
Inferring resnet18 classification etlt model with python TAO Toolkit	45	4053	October 12, 2021
Falure to do inference TAO Toolkit tensorrt	9	1075	January 11, 2022
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1598	March 3, 2022
Run PeopleNet with tensorrt TAO Toolkit	35	9877	August 10, 2021
Can't load trt engine and throwing an instance of 'nvinfer1::MyelinError' TAO Toolkit	17	2751	October 12, 2021
How to do inference with fpenet_fp32.trt TAO Toolkit	21	2652	August 24, 2021
Inferring detectnet_v2 .trt model in python TAO Toolkit tensorrt	58	3639	August 17, 2021
Incorrect Results When Using TensorRT Inference Server With TLT Model TAO Toolkit tensorrt	19	1862	October 12, 2021

Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model

Related topics