TensorRT engine produces incorrect results

Description

Hi everyone,

I am using the ENet model for image segmentation.
I have integrated TensorRT 7.1.3.4. and I am running into the problem that my segmentation result has an offset of about ~6 pixels in both x- and y-dimension. (The input size is 256,256,3)

I have done the following steps:
1.) Convert PB file to ONNX using tf2onnx with opset 11 (opsets before do not work in TRT due to ResizeNearestNeighbor op)
2.) Build engine in CPP following ONNXMNIST sample code
3.) Inference in CPP following ONNXMNIST sample code

I already did some analysis to figure out that the issue lies somehow in the building of the CPP tensorRT engine:
4.) trtexec.exe --explicitBatch --onnx=enet.onnx --saveEngine=enetTRT.engine --workspace=1000
taking the engine from here and using it for inference in the product code results in the same issue
5.) I build onnxruntime with tensorRT execution provider, created a session with the same onnx model and cached the resulting engine. Using this engine in the product code works correctly.
6.) I took the engines created by trtexec and by my own implementation and ran them through onnxruntime → same offset issue.

→ So the issue definitely is caused by the engine. My question is: What is onnxruntime doing correctly that tensorrt isn’t? I have been trying different parameterizations of tensorRT to no avail,
I have seen the following parameters in the onnxruntime description:

“By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1, FP16 mode is disabled and TensorRT engine caching is disabled.”

However, I didnt find anything in the CPP API to set the subgraph size. And setting the SetMinTiminingIterations to 1000 also results in a much longer build time ( several minutes compared to <1min), which also seems wrong.

Environment

TensorRT Version: 7.1.3.4
GPU Type: Quadro M2200
Nvidia Driver Version: 452.39
CUDA Version: 11.0
CUDNN Version: 8.0
Operating System + Version: Win10

Relevant Files

I can’t share my exact model, however i think it should be reproducible with the standard enet model.
These are the ops used in the model, in case it is related to that:

    tensorflow ops: Counter({'Const': 383, 'Add': 176, 'Relu': 147, 'Mul': 101, 'Conv2D': 86, 'Neg': 66, 'BiasAdd': 29, 'Identity': 23, 'StridedSlice': 12, 'SpaceToBatchND': 8, 'BatchToSpaceND': 8, 'Shape': 5, 'Transpose': 4, 'MaxPool': 3, 'Pack': 3, 'Conv2DBackpropInput': 3, 'Pad': 2, 'ResizeNearestNeighbor': 2, 'Placeholder': 1, 'ConcatV2': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})
    tensorflow attr: Counter({'T': 685, 'dtype': 384, 'value': 383, 'data_format': 121, 'padding': 92, 'strides': 92, 'dilations': 89, 'explicit_paddings': 89, 'use_cudnn_on_gpu': 89, 'Tblock_shape': 16, 'ellipsis_mask': 12, 'shrink_axis_mask': 12, 'Index': 12, 'begin_mask': 12, 'new_axis_mask': 12, 'end_mask': 12, 'Tpaddings': 10, 'Tcrops': 8, 'out_type': 5, 'N': 4, 'Tperm': 4, 'ksize': 3, 'Tidx': 3, 'axis': 3, 'half_pixel_centers': 2, 'align_corners': 2, 'keep_dims': 2, 'shape': 1, 'Tshape': 1})
    onnx mapped: Counter({'Const': 383, 'Add': 176, 'Relu': 147, 'Mul': 101, 'Conv2D': 86, 'Neg': 66, 'Identity': 23, 'StridedSlice': 12, 'BiasAdd': 11, 'SpaceToBatchND': 8, 'BatchToSpaceND': 8, 'Shape': 5, 'Transpose': 4, 'MaxPool': 3, 'Pack': 3, 'Conv2DBackpropInput': 3, 'Pad': 2, 'ResizeNearestNeighbor': 2, 'Placeholder': 1, 'ConcatV2': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})

Could you please share the output log of trtexec command with “–verbose” mode?

Thanks

The forum doesn’t let me upload any files, always get error:
undefined method `hostname’ for nil:NilClass

is this a known issue?
If I copy paste the log I have to split it to 3 replies due to character limit.

Hi @m.b.89,

Can you please tell me the size and file type that you are trying to upload?

Thanks,
Tom

I tried 3.8MB txt file, and as 200KB zip file, both did not work.

Thanks for the information, I was able to replicate, I have the team looking into this now.
Please watch this thread for updates.

Thanks,
Tom

Hi @m.b.89,

Good news, the upload issue has been resolved. Please try now.

Thanks,
Tom

Happy to hear.

Cheers,
Tom

Hi everyone,

I tried to further narrow down my issue.
→ I used ONNXRuntime dump subgraph to rule out that any further onnx optimization is doing something differently → use the model.onnx as input → output still incorrect
→ I checked the implementation of the tensorrt execution provider: onnxruntime/tensorrt_execution_provider.cc at main · microsoft/onnxruntime · GitHub

Any settings regarding maxIterations and subgraphsize relate to the first point and are not directly used as tensorrt input.
Only tensorRT related setting is maxWorkspace size, which I also set to 1000MB.
The only thing I am not sure about are the optimization profiles:
My network has the input dimensions ( -1, 256, 256, 3), to build it as explicitBatch, I set all 3 (kOPT, kMin and kMax) to (1, 256, 256, 3) in a single optimization profile. Is this correct? Is TRTExec doing the same?

Is it possible I get much different performance timings using onnx in python compared to tensorRT in C++? Resulting in different tactics that might cause the issue?

I got the same error, have you found the solution?

I upgraded to TensorRT 7.2.1.6 today, and the offset problem seems resolved.