Input dtype into INT8 Quantized TensorRT Network

Quick question - when running inference on a quantized network, should the input dtype be int8 or fp32?
In all of the example codes I’ve run across the input is always fp32, but in the Nvidia MLPerf GitHub repo, the input is INT8.

I tried both, but I haven’t seen any significant speedup, which is weird since you’d expect the overhead for copying an fp32 tensor to be significantly larger than an int8 tensor.

Thanks!