Huge speed difference between engines built from scratch and engines built from onnx

Description

I have a yolov5 model which I would like to deploy.
I found that if I convert my model from onnx to TensorRT, trtexec indicates an inference speed of 25 fps.
But if I build the model layer for layer using INetworkDefinition, the inference speed triples.
How come the TensorRT model is so much faster when explicitly building the model instead of converting from onnx?
Both cases use int8 quantization.

Thanks!

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
CUDA Version: 10.2.89
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.5.1

Hi @frederikschoeller,

It depends, sometimes ONNX parser could introduce some additional ops, which may affect the inference speed.

Thank you.

Hi @frederikschoeller,

We are working on this issue. Could you please share us issue repro script of manually defining the network.

Thank you.