Wrong inference result with TensorRT using onnx parser (C++)


I’ve created a topic here
(It’s about failed to use tensorRT to run inference with uff parser)
As @AakankshaS suggested, I change to use onnx parser, but I still couldn’t get the correct result…
I’ve describe this situation in that post but didn’t get response, so I think I should create another topic here…
Here is the thing:
the result of the inference after post-processing should be an image(let’s call it as ‘A’) with size 1920 x 1920,
but it turned out to be an image consisted of 4 ‘A’ in each row and column, and the image’s size is still 1920 x 1920.
(i.e., there are 16 small ‘A’ in an image)
Furthermore, each small ‘A’ seems to get lighter from top left to bottom right(?).
I’m not sure if it’s the problem of model or the post-processing…
Could anyone help me out with this issue?

I’ve done these process during the conversion of model:

  1. Add a permute layer after my output layer while converting .h5 to .onnx
    (Since ONNX work with NCHW order of tensor’s dimension, and my model’s is NHWC)
  2. Set the batch dimension to 1 in my onnx model
    (Or else I’ll have to set optimization profile(which I’ve tried and still failed to make it Q_Q),
    since there’s a dimension for batch with “?” in my original model, it’ll recognized as dynamic input)

Here’s the information of my onnx to engine’s conversion:

[07/17/2020-11:51:10] [I] Building and running a GPU inference engine for Onnx MNIST
Input filename: trial_multi_batch1.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: keras2onnx
Producer version: 1.6.0
Domain: onnx
Model version: 0
Doc string:

Any help or advice would be appreciated!


TensorRT Version:
GPU Type: RTX 2080 Ti
Nvidia Driver Version: 432.00
CUDA Version: 10.0
CUDNN Version: 7.4.2
Operating System + Version: Windows10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): 1.13.1
PyTorch Version (if applicable): -
Baremetal or Container (if container which image + tag): -

Relevant Files


Steps To Reproduce

Run the cpp file with the same setting as SampleOnnxMNIST.cpp, and you can compare the result I provided in the zip file.

Hi @cocoyen1995,
Can you please try validating the onnx model output?

Also this is not mandatory here as TRT support NHWC format .

1 Like

Hi @AakankshaS,

Thanks for your reply!
I’ve tried to run my onnx model in NHWC order in TensorRT and the result seems alike with .h5’s result!
(Running with FP16 cause slightly difference)

Two more question here,

  1. The performance of tensorRT’s inference doesn’t improve from the one I used before.
    (.h5 model convert to .pb and inference with Tensorflow C++'s API, the whole process took about 55ms/pic include read image, inference, and post-processing)
    I checked the time consuming in the inference section and post-processing section(if only do softmax),
    the former section took about 52~55ms/pic, and the later took about 55~60ms/pic.
    (And I have to do argmax after finishing softmax to get the final result, it’s even more time-consuming…)
    Is there anything wrong with my inference code?
    Or is it possible to let softmax and argmax be done during the inference section?

  2. If I want to use INT8’s engine to do the inference, except for the setting when creating engine,
    is there anything else I have to do to make the process correct?
    Since if I only change the parameter of the code

mParams.fp16 = 1, mParams.int8 = 0
(which make difference at config->setFlag(BuilderFlag::kFP16);)


mParams.fp16 = 0, mParams.int8 = 1
(which make difference at config->setFlag(BuilderFlag::kINT8);)

the result is quite different from FP16’s one…

Thanks again for your help, hope can hear from you soon!

Hi @cocoyen1995,
Looks like there is some issue with the processing part,
Can you try comparing the output before processing with h5 model output?

You can refer to the below link.


Hi @AakankshaS,

Sorry for the late reply.
Finally, I’ve added the calibration part to make INT8 engine, but the result is a bit weird…
The original result of the output range from 0. to 1.
But while it’s in INT8’s engine, the output only contained 0.5 and 1(not the numbers range from 0 ~ 1).
What may be the reason of this difference?
Would it be the calibration part’s pre-processing error or anything else?

Here’s my code I’ve done so far:

Thanks again for your help, hope can hear from you soon!