Dear @648976749,
I notice below numbers using trtexec
/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx
[06/16/2022-08:35:15] [I] === Performance summary ===
[06/16/2022-08:27:54] [I] Throughput: 124.961 qps
[06/16/2022-08:27:54] [I] Latency: min = 7.98013 ms, max = 8.12762 ms, mean = 8.03174 ms, median = 8.03023 ms, percentile(99%) = 8.10669 ms
[06/16/2022-08:27:54] [I] Enqueue Time: min = 0.344971 ms, max = 0.9151 ms, mean = 0.528215 ms, median = 0.470154 ms, percentile(99%) = 0.849792 ms
[06/16/2022-08:27:54] [I] H2D Latency: min = 0.0270996 ms, max = 0.133423 ms, mean = 0.0482375 ms, median = 0.0380859 ms, percentile(99%) = 0.0996094 ms
[06/16/2022-08:27:54] [I] GPU Compute Time: min = 7.94235 ms, max = 8.02905 ms, mean = 7.98089 ms, median = 7.97864 ms, percentile(99%) = 8.0257 ms
[06/16/2022-08:27:54] [I] D2H Latency: min = 0.00146484 ms, max = 0.00341797 ms, mean = 0.00261398 ms, median = 0.00268555 ms, percentile(99%) = 0.00326538 ms
/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx --fp16
[06/16/2022-08:35:15] [I] === Performance summary ===
[06/16/2022-08:35:15] [I] Throughput: 362.688 qps
[06/16/2022-08:35:15] [I] Latency: min = 2.77734 ms, max = 2.87854 ms, mean = 2.80078 ms, median = 2.79565 ms, percentile(99%) = 2.84412 ms
[06/16/2022-08:35:15] [I] Enqueue Time: min = 0.399048 ms, max = 0.932114 ms, mean = 0.604471 ms, median = 0.681885 ms, percentile(99%) = 0.824463 ms
[06/16/2022-08:35:15] [I] H2D Latency: min = 0.029541 ms, max = 0.119522 ms, mean = 0.0431197 ms, median = 0.0369873 ms, percentile(99%) = 0.0864258 ms
[06/16/2022-08:35:15] [I] GPU Compute Time: min = 2.73926 ms, max = 2.77002 ms, mean = 2.75398 ms, median = 2.75391 ms, percentile(99%) = 2.76648 ms
[06/16/2022-08:35:15] [I] D2H Latency: min = 0.0012207 ms, max = 0.00585938 ms, mean = 0.00367615 ms, median = 0.00366211 ms, percentile(99%) = 0.00537109 ms
/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx --int8
[06/16/2022-08:37:07] [I] === Performance summary ===
[06/16/2022-08:37:07] [I] Throughput: 611.933 qps
[06/16/2022-08:37:07] [I] Latency: min = 1.60986 ms, max = 1.64307 ms, mean = 1.62563 ms, median = 1.62549 ms, percentile(99%) = 1.63812 ms
[06/16/2022-08:37:07] [I] Enqueue Time: min = 0.361084 ms, max = 1.14978 ms, mean = 0.496531 ms, median = 0.473755 ms, percentile(99%) = 0.669434 ms
[06/16/2022-08:37:07] [I] H2D Latency: min = 0.0187988 ms, max = 0.0214844 ms, mean = 0.0196352 ms, median = 0.0195312 ms, percentile(99%) = 0.0209961 ms
[06/16/2022-08:37:07] [I] GPU Compute Time: min = 1.58887 ms, max = 1.62158 ms, mean = 1.60435 ms, median = 1.6041 ms, percentile(99%) = 1.61694 ms
[06/16/2022-08:37:07] [I] D2H Latency: min = 0.0012207 ms, max = 0.00244141 ms, mean = 0.00164729 ms, median = 0.00167847 ms, percentile(99%) = 0.0020752 ms