Hi, I’m running inference on a CV image detection network on Xavier in INT8 on batch size 1. I’m converting from an Onnx model to TensorRT using the sample function provided. When I ran inference through nvprof, I saw around the same range of performance between the FP16 and INT8 versions, and I also noticed an incredibly high number of memcpy calls in the INT8 version (but same total times.) INT8 is supported by Xavier, but I don’t see any speedup? Using TRT 18.104.22.168, cuda10.
Clarification appreciated, thank you.