Should pruning a model prior to converting it to tensorRT make inference faster?

Hi,

Another thing worth to check is how many layers are inferenced with TensorRT.

Please noticed that the frameworks you used is TF-TRT.
TF-TRT integrated TensorRT into TensorFlow interface so the layer may be inference with either TensorFlow or TensorRT.

The layer deployment can be found in the TensorFlow log.
Could you help to collect it and share with us?

Thanks.