[TensorRT] ERROR: input: dynamic input is missing dimensions in profile 0

Description

Hey guys,
I am converting PSENet from pytorch to onnx and finally to tensorrt. But I met the problem saying in the topic. Does anyone met this and have any idea?
Here is my code for building engine.

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)


def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder, \
    builder.create_network(EXPLICIT_BATCH) as network, \
    builder.create_builder_config() as config,\
    trt.OnnxParser(network, TRT_LOGGER) as parser:

    builder.max_workspace_size = 1 << 20
    builder.max_batch_size = 1

    with open(model_path, "rb") as f:
        parser.parse(f.read())
    #network.add_input("data", trt.float32, (1, 3, -1, -1))
    profile = builder.create_optimization_profile()        
    profile.set_shape("data", (1, 3, 100, 100), (1, 3, 896, 1312), (1, 3, 2000, 3000))
    config.add_optimization_profile(profile)
    last_layer = network.get_layer(network.num_layers - 1)
    print(config)
    network.mark_output(last_layer.get_output(0))
    engine = builder.build_engine(network, config)
    return engine

And the code for onnx converting:

def export_onnx_model(model, onnx_path, input_image, input_names=None, 
    output_names=None, dynamic_axes=None):
    inputs = input_image
    model(inputs)
    torch.onnx.export(model,
                      inputs,
                      onnx_path,
                      input_names=input_names,
                      output_names=output_names,
                      dynamic_axes=dynamic_axes)

export_onnx_model(model, onnx_path, image_preprocessed, ["input"], ["output"], {
    "input": [2, 3],
    "output": [2, 3]
})

Environment

TensorRT Version:7.0
GPU Type: V100
Nvidia Driver Version: 440.100
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.7.5
PyTorch Version (if applicable): 1.5.1

Hi @lyzs1225,

Please refer to the below link for reference.


Also , request you to share your model and script so that we can try reproducing teh issue, and can assist you better.
Thanks!

Hi @AakankshaS,
I have seen that link but still do not understand how to solve,
I uploaded the .onnx model and scripts here:Model and Script
Thanks!

Hi @lyzs1225,

From glancing at your code and model, it looks pretty good to me. The only thing that stood out, was that when viewing the ONNX model in Netron, the input node’s name is “input”:
image

But your script uses the input name “data”, which matches up with the error saying you didn’t specify dimensions for the input named “input”.

I would try to change this line from “data”:

    profile.set_shape("data", (1, 3, 100, 100), (1, 3, 896, 1312), (1, 3, 2000, 3000))

to “input” (or whatever the actual input node’s name is for an arbitrary model:

    profile.set_shape("input", (1, 3, 100, 100), (1, 3, 896, 1312), (1, 3, 2000, 3000))

You can get the input names programmatically like so:

    # Query input names and shapes from parsed TensorRT network
    network_inputs = [network.get_input(i) for i in range(network.num_inputs)]
    input_names = [_input.name for _input in network_inputs]   # ex: ["actual_input1"]
1 Like

Thank you very much!

Hi @NVES_R,
Do you have any examples of dynamic shape tensorrt inferrence code?
Thanks!

Hi @NVES_R ,
I met another problem when looking at the model output size:

last_layer = network.get_layer(network.num_layers - 1)
print("name:{}".format(last_layer.name))
print(last_layer.get_output(0).shape)

And the output is:

name:Relu_517
(1, 256, -1, -1)

However, the graph in Netron shows Relu_517 is in the middle of the network


How can I get the real output?(which has a shape of (1,7,-1,-1))
Or do I have to locate the outputs come from the upper part and then complete the full network with tensorrt API?

Hi @lyzs1225,

I’m not sure why you’re getting an intermediate node as the output. Just using trtexec, I was able to convert the model successfully, and it seems like the correct output layer was picked up.

  1. Start a TensorRT 7.1 container:
nvidia-docker run -it -v `pwd`:/mnt -w /mnt nvcr.io/nvidia/tensorrt:20.06-py3
  1. Converting dynamic shape ONNX model to TensorRT engine with optimization profile:
root@255b25dad42f:/mnt# trtexec --explicitBatch --onnx=pse_sim.onnx \
  --minShapes=input:1x3x100x100 \
  --optShapes=input:1x3x896x1312 \
  --maxShapes=input:1x3x2000x3000 \
  --saveEngine=pse_sim.engine
...
----------------------------------------------------------------
Input filename:   pse_sim.onnx
ONNX IR version:  0.0.6
Opset version:    9
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[06/30/2020-08:17:04] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[06/30/2020-08:17:32] [I] [TRT] Detected 1 inputs and 1 output network tensors.
...
&&&& PASSED TensorRT.trtexec # trtexec --explicitBatch --onnx=pse_sim.onnx --minShapes=input:1x3x100x100 --optShapes=input:1x3x896x1312 --maxShapes=input:1x3x2000x3000 --saveEngine=pse_sim.engine

Do you have any examples of dynamic shape tensorrt inferrence code?

  1. Test inference on random inputs on dynamic shape engine:
root@255b25dad42f:/mnt# python3 infer.py -e pse_sim.engine 
Loaded engine: pse_sim.engine
Active Optimization Profile: 0
Engine/Binding Metadata
        Number of optimization profiles: 1
        Number of bindings per profile: 2
        First binding for profile 0: 0
        Last binding for profile 0: 1
Generating Random Inputs 
        Input [input] shape: (1, 3, -1, -1)        <----------------------
        Profile Shapes for [input]: [kMIN (1, 3, 100, 100) | kOPT (1, 3, 896, 1312) | kMAX (1, 3, 2000, 3000)]
        Input [input] shape was dynamic, setting inference shape to (1, 3, 896, 1312)      <-------------------
Input Metadata
        Number of Inputs: 1
        Input Bindings for Profile 0: [0]
        Input names: ['input']
        Input shapes: [(1, 3, 896, 1312)]
Output Metadata
        Number of Outputs: 1
        Output names: ['output']
        Output shapes: [(1, 7, 896, 1312)]      <-------------------
        Output Bindings for Profile 0: [1]
...

Note the lines with <-------- above pointing out the expected output shapes. My script sets the output shape to the profile’s kOPT shape by default if it was dynamic (contains -1’s).

Inference script can be found here: https://github.com/rmccorm4/tensorrt-utils/blob/493aa3827ff2c9886436ee4cbe60fed79d5bd263/inference/infer.py. Note this script is just meant for debugging, not necessarily for optimal performance/deployment.

Hi @NVES_R ,

Thank you soooooooo much for your really helpful response. I am pulling the docker image and will try soon. Do you think whether my python code of building the engine is wrong or not?

I tried trtexec and got the error:

Cuda failure: CUDA driver version is insufficient for CUDA runtime version
Aborted (core dumped)

Should I just reinstall a new CUDA?

Update:
I updated CUDA10.2 to CUDA11 but still have the same error when running the docker image

nvcr.io/nvidia/tensorrt:20.06-py3

Hi @lyzs1225,

Should I just reinstall a new CUDA?

I would look at the “Driver Requirements” section of this page from the container release notes: https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_20-06.html#rel_20-06

Specifically:

Release 20.06 is based on NVIDIA CUDA 11.0.167, which requires NVIDIA Driver release 450.36. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 418.xx or 440.30. The CUDA driver’s compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

I would recommend sticking with CUDA 10.2, and switching to Driver 440.30 as it should support compatability mode per the description above.

Do you think whether my python code of building the engine is wrong or not?

If I were to guess, this line might be a problem:

    network.mark_output(last_layer.get_output(0))

This line is usually used when the parser failed to find the output on it’s own. But if the parser was able to find the output successfully, then adding this line might do something extra that isn’t desired - just a guess.

Hi @NVES_R,
Thanks a lot!