I tried to benchmark int8 and fp16 for mobilenet0.25+ssd in jetson nx with jetpack 4.6.
for post training, i use pytorch-quantization toolkit (TensorRT/tools/pytorch-quantization at master · NVIDIA/TensorRT · GitHub) and generate the calibrated onnx.
But I found out the performance of int8 is much slower than fp16.
with trtexec, fp16 reaches 346.861 qps, and int8 reaches 217.914 qps.
and here is the trtexec log from int8
int8.txt (28.7 KB)
and here is fp16
fp16.txt (30.0 KB)