the latency of int8 mode in T4 is very slow

When I use v100 for inference, the latency of FP32 mode is 1.5ms / image, and the latency of INT8 mode is 0.76ms / image.

I use the same code and env in T4, the results are very different from v100. The latency of FP32 mode is 1.6ms / image, but the latency of INT8 mode is 1.4ms / image.

The INT8 mode accuracy on V100 and T4 is the same, but the latency of INT8 mode seems too slow compare with the FP32 mode.

Hi,

Are you using the same TensorRT engine/plan on both V100 and T4? Or did you create a separate engine on each platform?

Hi,

I didn’t generate an engine plan file.
I used tensorflow to train my model and then transformed it through uff.

Hi,

Can you share your code so I can try to reproduce this or further debug?

Thanks,
NVIDIA Enterprise Support