Description
Hi everyone,
I am using the ENet model for image segmentation.
I have integrated TensorRT 7.1.3.4. and I am running into the problem that my segmentation result has an offset of about ~6 pixels in both x- and y-dimension. (The input size is 256,256,3)
I have done the following steps:
1.) Convert PB file to ONNX using tf2onnx with opset 11 (opsets before do not work in TRT due to ResizeNearestNeighbor op)
2.) Build engine in CPP following ONNXMNIST sample code
3.) Inference in CPP following ONNXMNIST sample code
I already did some analysis to figure out that the issue lies somehow in the building of the CPP tensorRT engine:
4.) trtexec.exe --explicitBatch --onnx=enet.onnx --saveEngine=enetTRT.engine --workspace=1000
taking the engine from here and using it for inference in the product code results in the same issue
5.) I build onnxruntime with tensorRT execution provider, created a session with the same onnx model and cached the resulting engine. Using this engine in the product code works correctly.
6.) I took the engines created by trtexec and by my own implementation and ran them through onnxruntime → same offset issue.
→ So the issue definitely is caused by the engine. My question is: What is onnxruntime doing correctly that tensorrt isn’t? I have been trying different parameterizations of tensorRT to no avail,
I have seen the following parameters in the onnxruntime description:
“By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1, FP16 mode is disabled and TensorRT engine caching is disabled.”
However, I didnt find anything in the CPP API to set the subgraph size. And setting the SetMinTiminingIterations to 1000 also results in a much longer build time ( several minutes compared to <1min), which also seems wrong.
Environment
TensorRT Version: 7.1.3.4
GPU Type: Quadro M2200
Nvidia Driver Version: 452.39
CUDA Version: 11.0
CUDNN Version: 8.0
Operating System + Version: Win10
Relevant Files
I can’t share my exact model, however i think it should be reproducible with the standard enet model.
These are the ops used in the model, in case it is related to that:
tensorflow ops: Counter({'Const': 383, 'Add': 176, 'Relu': 147, 'Mul': 101, 'Conv2D': 86, 'Neg': 66, 'BiasAdd': 29, 'Identity': 23, 'StridedSlice': 12, 'SpaceToBatchND': 8, 'BatchToSpaceND': 8, 'Shape': 5, 'Transpose': 4, 'MaxPool': 3, 'Pack': 3, 'Conv2DBackpropInput': 3, 'Pad': 2, 'ResizeNearestNeighbor': 2, 'Placeholder': 1, 'ConcatV2': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})
tensorflow attr: Counter({'T': 685, 'dtype': 384, 'value': 383, 'data_format': 121, 'padding': 92, 'strides': 92, 'dilations': 89, 'explicit_paddings': 89, 'use_cudnn_on_gpu': 89, 'Tblock_shape': 16, 'ellipsis_mask': 12, 'shrink_axis_mask': 12, 'Index': 12, 'begin_mask': 12, 'new_axis_mask': 12, 'end_mask': 12, 'Tpaddings': 10, 'Tcrops': 8, 'out_type': 5, 'N': 4, 'Tperm': 4, 'ksize': 3, 'Tidx': 3, 'axis': 3, 'half_pixel_centers': 2, 'align_corners': 2, 'keep_dims': 2, 'shape': 1, 'Tshape': 1})
onnx mapped: Counter({'Const': 383, 'Add': 176, 'Relu': 147, 'Mul': 101, 'Conv2D': 86, 'Neg': 66, 'Identity': 23, 'StridedSlice': 12, 'BiasAdd': 11, 'SpaceToBatchND': 8, 'BatchToSpaceND': 8, 'Shape': 5, 'Transpose': 4, 'MaxPool': 3, 'Pack': 3, 'Conv2DBackpropInput': 3, 'Pad': 2, 'ResizeNearestNeighbor': 2, 'Placeholder': 1, 'ConcatV2': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})