About the time taken for FP16 inference and int8 inference in TensorRT

When performing inference with TensorRT, is it possible for INT8 inference to be slower than FP16 inference?

Yes, it is possible for INT8 inference to be slower than FP16 inference with TensorRT for a few reasons.
Int8 models may not be as well optimized as FP16 models for some hardware platforms.
INT8 quantization can introduce additional overhead.

1 Like