I have a segmentation model in onnx format and use trtexec to convert it to int8 and fp16 model. However, trtexec output shows almost no difference in terms of execution time between int8 and fp16 on RTX2080. I expect int8 should run almost 2x faster than fp16.
I use the following commands to convert my onnx to fp16 and int8 trt engine. More details are below. Are my conversion scripts correct?
fp16:
./trtexec --onnx=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.onnx --fp16 --saveEngine=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_fp16.trt --dumpProfile --exportTimes=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_fp16.json
GPU compute time for fp16:
[06/17/2021-18:54:01] [I] Enqueue Time
[06/17/2021-18:54:01] [I] min: 10.1111 ms
[06/17/2021-18:54:01] [I] max: 11.0121 ms
[06/17/2021-18:54:01] [I] median: 10.2297 ms
[06/17/2021-18:54:01] [I] GPU Compute
[06/17/2021-18:54:01] [I] min: 9.04541 ms
[06/17/2021-18:54:01] [I] max: 9.98022 ms
[06/17/2021-18:54:01] [I] mean: 9.19428 ms
[06/17/2021-18:54:01] [I] median: 9.17056 ms
[06/17/2021-18:54:01] [I] percentile: 9.88184 ms at 99%
int8:
./trtexec --onnx=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.onnx --int8 --saveEngine=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_int8.trt --dumpProfile --exportTimes=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_int8.json
GPU compute time for int 8:
[06/17/2021-19:11:44] [I] Enqueue Time
[06/17/2021-19:11:44] [I] min: 9.42017 ms
[06/17/2021-19:11:44] [I] max: 10.7423 ms
[06/17/2021-19:11:44] [I] median: 9.50635 ms
[06/17/2021-19:11:44] [I] GPU Compute
[06/17/2021-19:11:44] [I] min: 8.3761 ms
[06/17/2021-19:11:44] [I] max: 9.70874 ms
[06/17/2021-19:11:44] [I] mean: 8.55703 ms
[06/17/2021-19:11:44] [I] median: 8.45605 ms
[06/17/2021-19:11:44] [I] percentile: 9.62769 ms at 99%
Thanks,