Why does inference process terminate without error?

I’m working on a project with TensorRT due to some speed issue.
As far as I know, the whole process should be like this…

(keras).h5/hdf5 -> (tensorflow).pb -> .uff -> .engine

So far, I’ve run through the whole process with the same model structure (UNET) I used before as I use in this project.
The only difference between these two models is the model input/output size:

old model - input: 1664 x 288 x 1, output:1664 x 288 x 6
current model - input: 1920 x 1920 x 1, output:1920 x 1920 x 2

However, when I go through the whole process as same as I’ve done with the old model on the current one,
it terminates without any error before running inference with engine file.
It terminates at createMnistCudaBuffer() and left the output before that function on the terminal…
I thought it might be the memory issue, so I’ve tried to adjust the declaration of MAX_WORKSPACE, but it didn’t work.
Did I miss something important which may cause this situation?


TensorRT Version:
I installed tensorRT python’s wheel under Anaconda with TRT5.0.2.6, and install TRT5.1.5 under my OS
GPU Type: RTX 2080 Ti (11G)
Nvidia Driver Version: (Sorry I forgot to check, maybe 417.xx?)
CUDA Version: 10.0
CUDNN Version: 7.4.2
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): 1.13.1
PyTorch Version (if applicable): -
Baremetal or Container (if container which image + tag): -

Any help or advice would be appreciated!


It is recommended to use the latest TRT release(7.1), as it has improved features and performance.
You can download the latest release from the below link

If you are using TRT>=7, UFF conversion has been deprecated, hence you can use it as :
Keras(.h5) << ONNX << TRT
Tensorflow(.pb) << ONNX << TRT

Once you get your ONNX model, you can follow the steps from the link below

In case if the issue persist, request you to share your pb files for old and new model, so that we can assist better.



Thanks for your quick reply! I’ll try that out later. :D


I’ve tried to do the inference with ONNX model made from this way:

Keras(.h5) << ONNX << TRT

But the result is different from what I expected…
(I don’t know how to explain it clearly…)
For example,
the result of the inference after post-processing should be an image(let’s call it as ‘A’) with size 1920 x 1920,
but it turned out to be an image consisted of 4 ‘A’ in each row and column, and the image’s size is still 1920 x 1920.
(i.e., there are 16 small ‘A’ in an image)
Furthermore, each small ‘A’ seems to get lighter from top left to bottom right(?).
I’m not sure if it’s the problem of model or the post-processing,
should I keep this issue here or ask on another topic?

By the way, here’s what I’ve done in the process of converting model:

  1. Add a permute layer after my output layer while converting .h5 to .onnx
    (Since ONNX work with NCHW order of tensor’s dimension, and my model’s is NHWC)
  2. Set the batch dimension to 1 in my onnx model
    (Or else I’ll have to set optimization profile(which I’ve tried and still failed to make it Q_Q),
    since there’s a dimension for batch with “?” in my original model, it’ll recognized as dynamic input)

And here’s the information of my onnx conversion:

[07/17/2020-11:51:10] [I] Building and running a GPU inference engine for Onnx MNIST
Input filename: trial_multi_batch1.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: keras2onnx
Producer version: 1.6.0
Domain: onnx
Model version: 0
Doc string:

(I’m using TensorRT7.0.0.11)
Thanks in advance for all the help!