When I use v100 for inference, the latency of FP32 mode is 1.5ms / image, and the latency of INT8 mode is 0.76ms / image.
I use the same code and env in T4, the results are very different from v100. The latency of FP32 mode is 1.6ms / image, but the latency of INT8 mode is 1.4ms / image.
The INT8 mode accuracy on V100 and T4 is the same, but the latency of INT8 mode seems too slow compare with the FP32 mode.