I have a yolov5 model which I would like to deploy.
I found that if I convert my model from onnx to TensorRT, trtexec indicates an inference speed of 25 fps.
But if I build the model layer for layer using INetworkDefinition, the inference speed triples.
How come the TensorRT model is so much faster when explicitly building the model instead of converting from onnx?
Both cases use int8 quantization.
TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
CUDA Version: 10.2.89
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.5.1