I have two models of different sizes. One has 35.9m parameters, the other 12.7m.
When I convert the models to TensorRT with
trtexec --onnx=model.onnx --batch=5 --fp16 the resulting models have roughly the same inference speed even though the speed should be vastly different.
TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
CUDA Version: 10.2.89
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.5.1