Falure to do inference

moche.chan · December 9, 2021, 8:18am

Description

The attached c++ program is for LPRNET and it produces a failure when doing a inference.
How to produce a good inferred result?

Environment

TensorRT Version: 7.1.3.0
GPU Type: jetson xavier nx
Nvidia Driver Version: jetpack 4.5.1
CUDA Version: 10.2.89
CUDNN Version: 8.0.0.180
Operating System + Version: ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

car1
nx4.5us_lprnet_baseline18_deployable.etlt_b16_gpu0_fp16.engine (31.6 MB)
trt_lprnet.cpp (5.6 KB)

The attached engine file is produced by the following commands:

wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt
./tao-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 us_lprnet_baseline18_deployable.etlt -t fp16 -e lpr_us_onnx_b16.engine

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

The file can be compiled/run by using:

g++ trt_lprnet.cpp -lnvinfer -Lcudart -pthread $(pkg-config --cflags --libs opencv4) -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda -lcudart -O0 -p -g 
./a.out

Error messages:

a.out: trt_lprnet.cpp:62: void doInference(nvinfer1::IExecutionContext&, float*, float*, int): Assertion `engine.getNbBindings() == 2' failed.
Aborted (core dumped)

NVES · December 9, 2021, 5:11pm

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

moche.chan · December 13, 2021, 4:00am

Unfortunately, I can not compile the mentioned trtexec successfully so far.
The attached model should be good because the model can work for the python code.
lprnet_simplest.py (3.4 KB)

The identical model works for the attached python code, but does not work for the attached c++ code. Therefore, I just want to find out a workable c++ code.

spolisetty · December 23, 2021, 9:16am

Hi,

Looks like you’re using TAO toolkit LPRNET. We are moving this post to TAO forum to get better help.

Thank you.

Morganh · December 24, 2021, 2:27am

@moche.chan
I suggest you to take a look at GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream for running inference.

Morganh · December 24, 2021, 2:50pm

More, please note that there are two outputs for lprnet. See below

$ python -m pip install colored
$ python -m pip install polygraphy --index-url https://pypi.ngc.nvidia.com
$ polygraphy inspect model model.plan
[I] Loading bytes from /workspace/model.plan
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine (35 layers)

---- 1 Engine Input(s) ----
{image_input [dtype=float32, shape=(-1, 3, 48, 96)]}

---- 2 Engine Output(s) ----
{tf_op_layer_ArgMax [dtype=int32, shape=(-1, 24)],
tf_op_layer_Max [dtype=float32, shape=(-1, 24)]}

---- Memory ----
Device Memory: 65249280 bytes

---- 1 Profile(s) (3 Binding(s) Each) ----

Profile: 0
Binding Index: 0 (Input) [Name: image_input] | Shapes: min=(1, 3, 48, 96), opt=(4, 3, 48, 96), max=(16, 3, 48, 96)
Binding Index: 1 (Output) [Name: tf_op_layer_ArgMax] | Shape: (-1, 24)
Binding Index: 2 (Output) [Name: tf_op_layer_Max] | Shape: (-1, 24)

Morganh · December 27, 2021, 3:29pm

I modify something based on your code.
It can work. Please check.

#include <string>
#include <future>
#include <deque>
#include <fstream>
#include <iostream>
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <mutex>
#include <stdio.h>
#include <cassert>
 
#include <opencv2/opencv.hpp>
 
#include "cuda_runtime_api.h"
#include <cuda.h>
#include "NvInfer.h"
 
 
#define DEVICE 0  // GPU id
#define BATCH_SIZE 1
 
static const int INPUT_H = 48;
static const int INPUT_W = 96;
static const int OUTPUT_SIZE = 24;
 
const char *INPUT_BLOB_NAME = "image_input";
const char *OUTPUT_BLOB_NAME_1 = "tf_op_layer_ArgMax";
const char *OUTPUT_BLOB_NAME_2 = "tf_op_layer_Max";
 
const std::string alphabet[] = {
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    "A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
    "K", "L", "M", "N", "P", "Q", "R", "S", "T", "U",
    "V", "W", "X", "Y", "Z"
};
 
 
#define CHECK(status) \
    do\
    {\
        auto ret = (status);\
        if (ret != 0)\
        {\
            std::cerr << "Cuda failure: " << ret << std::endl;\
            abort();\
        }\
    } while (0)
 
class Logger : public nvinfer1::ILogger {
    void log(Severity severity, const char* msg) override {
        if (severity <= Severity::kWARNING){
            std::cout << msg << std::endl;
        }
    }
} logger;
 
 
void doInference(nvinfer1::IExecutionContext &context, float *input, int *output_1, float *output_2, int batchSize) {
    const nvinfer1::ICudaEngine &engine = context.getEngine();
 
    // Pointers to input and output device buffers to pass to engine.
    // Engine requires exactly IEngine::getNbBindings() number of buffers.
    assert(engine.getNbBindings() == 3);
    void *buffers[3];
 
    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
    const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex_1 = engine.getBindingIndex(OUTPUT_BLOB_NAME_1);
    const int outputIndex_2 = engine.getBindingIndex(OUTPUT_BLOB_NAME_2);
 
    // Create GPU buffers on device
    CHECK(cudaMalloc(&buffers[inputIndex], batchSize * 3 * INPUT_H * INPUT_W * sizeof(float)));
    CHECK(cudaMalloc(&buffers[outputIndex_1], batchSize * OUTPUT_SIZE * sizeof(float)));
    CHECK(cudaMalloc(&buffers[outputIndex_2], batchSize * OUTPUT_SIZE * sizeof(float)));
 
    // Create stream
    cudaStream_t stream;
    CHECK(cudaStreamCreate(&stream));
 
    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
    CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream, nullptr);
    CHECK(cudaMemcpyAsync(output_1, buffers[outputIndex_1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
    CHECK(cudaMemcpyAsync(output_2, buffers[outputIndex_2], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
    cudaStreamSynchronize(stream);
 
    // Release stream and buffers
    cudaStreamDestroy(stream);
    CHECK(cudaFree(buffers[inputIndex]));
    CHECK(cudaFree(buffers[outputIndex_1]));
    CHECK(cudaFree(buffers[outputIndex_2]));
}
 
int main(int argc, char *argv[]){
    cudaSetDevice(DEVICE);
    char *trtModelStream{nullptr};
    size_t size{0};
 
    std::ifstream file("/workspace/demo_2.0/lprnet/lpr_us_onnx_b16.engine", std::ios::binary);
    if (file.good()) {
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStream = new char[size];
        assert(trtModelStream);
        file.read(trtModelStream, size);
        file.close();
    }
    std::cout << "size:" << size <<"\n";
 
    nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(logger);
    assert(runtime != nullptr);
 
    nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
    assert(engine != nullptr);
 
    nvinfer1::IExecutionContext *context = engine->createExecutionContext();
    assert(context != nullptr);
 
    static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
 
    cv::Mat img = cv::imread("/workspace/demo_2.0/lprnet/data/openalpr/train/image/wts-lg-000158.jpg");
    cv::Mat pr_img;
    cv::resize(img, pr_img, cv::Size(INPUT_W, INPUT_H), 0, 0, cv::INTER_CUBIC);
 
    int i = 0;
    for (int row = 0; row < INPUT_H; ++row) {
        uchar* uc_pixel = pr_img.data + row * pr_img.step;
        for (int col = 0; col < INPUT_W; ++col) {
            data[i + 2 * INPUT_H * INPUT_W] = ((float)uc_pixel[2] - 127.5)*0.003921568627451;
            data[i + INPUT_H * INPUT_W] = ((float)uc_pixel[1]-127.5)*0.003921568627451;
            data[i] = ((float)uc_pixel[0]-127.5)*0.003921568627451;
            uc_pixel += 3;
            ++i;
        }
    }
 
    // Run inference
    static int tf_op_layer_ArgMax[BATCH_SIZE * OUTPUT_SIZE];
    static float tf_op_layer_Max[BATCH_SIZE * OUTPUT_SIZE];
    auto start = std::chrono::system_clock::now();
    printf("running inference \n");
    doInference(*context, data, tf_op_layer_ArgMax, tf_op_layer_Max, BATCH_SIZE);
    auto end = std::chrono::system_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;
    std::cout << std::endl;

    std::vector<float> preds;
    for (int i = 0; i < 24; ++i) {
        preds.push_back(tf_op_layer_ArgMax[i]);
    }
 
    // remove repeat blank label
    int pre_c = preds[0];
 
    std::vector<int> no_repeat_blank_label;
    for (auto c: preds) {
        if (c == pre_c || c == 35) {
            if (c == 35) pre_c = c;
            continue;
        }
        no_repeat_blank_label.push_back(c);
        pre_c = c;
    }
 
    //print the character list
    std::string str;
    for (auto v: no_repeat_blank_label) {
        str += alphabet[v];
    }
 
    std::cout<<"result:"<<str<<std::endl;
 
    // Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();
 
    return 0;
}
 
/*
time g++ trt_lprnet.cpp -lnvinfer -Lcudart -pthread $(pkg-config --cflags --libs opencv4) -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda -lcudart -O0 -p -g && time ./a.out
 
nvcc -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcutil -lcudpp -lcuda -lcudart -c -o FSPB_main.o FSPB_main.cpp
*/

moche.chan · December 28, 2021, 5:07am

By running your codes, I cannot get license plate number, and outputs of tf_op_layer_ArgMax containing all zeros. Can you share your workable model file and the testing image? Thanks a lot.

By the way, I got the following error message:

Parameter check failed at: engine.cpp::resolveSlots::1227, condition: allInputDimensionsSpecified(routine)

Morganh · December 28, 2021, 11:54am

Step:

$ tao lprnet run /bin/bash
# wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt
# tao-converter -k nvidia_tlt -p image_input,1x3x48x96,1x3x48x96,1x3x48x96 us_lprnet_baseline18_deployable.etlt -t fp16 -e lpr_us_onnx_b16.engine
# time g++ trt_lprnet.cpp -lnvinfer -Lcudart -pthread $(pkg-config --cflags --libs opencv) -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda -lcudart -O0 -p -g && time ./a.out

system · January 11, 2022, 11:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1579	March 3, 2022
Running nvidia pretrained models in Tensorrt inference TAO Toolkit	14	907	October 6, 2022
How can I access the same TensorRT engine model in different thread TensorRT cudnn	1	549	November 27, 2023
Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model TAO Toolkit	6	679	July 22, 2021
TensorRT Inference form a .etlt model on Python TAO Toolkit tensorrt	7	1214	November 16, 2021
TAO MaskRCNN inference output problem TAO Toolkit	36	1006	November 30, 2023
Tensorrt Batch Inference TensorRT tensorrt	8	1568	December 1, 2020
Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match TAO Toolkit	20	1952	June 11, 2024
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	943	September 29, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5328	June 29, 2022