Little performance difference between int8 and fp16 on RTX2080

fcj · June 25, 2021, 4:41am

I have a segmentation model in onnx format and use trtexec to convert it to int8 and fp16 model. However, trtexec output shows almost no difference in terms of execution time between int8 and fp16 on RTX2080. I expect int8 should run almost 2x faster than fp16.
I use the following commands to convert my onnx to fp16 and int8 trt engine. More details are below. Are my conversion scripts correct?

fp16:
./trtexec --onnx=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.onnx --fp16 --saveEngine=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_fp16.trt --dumpProfile --exportTimes=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_fp16.json

GPU compute time for fp16:
[06/17/2021-18:54:01] [I] Enqueue Time
[06/17/2021-18:54:01] [I] min: 10.1111 ms
[06/17/2021-18:54:01] [I] max: 11.0121 ms
[06/17/2021-18:54:01] [I] median: 10.2297 ms
[06/17/2021-18:54:01] [I] GPU Compute
[06/17/2021-18:54:01] [I] min: 9.04541 ms
[06/17/2021-18:54:01] [I] max: 9.98022 ms
[06/17/2021-18:54:01] [I] mean: 9.19428 ms
[06/17/2021-18:54:01] [I] median: 9.17056 ms
[06/17/2021-18:54:01] [I] percentile: 9.88184 ms at 99%

int8:

./trtexec --onnx=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.onnx --int8 --saveEngine=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_int8.trt --dumpProfile --exportTimes=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_int8.json

GPU compute time for int 8:

[06/17/2021-19:11:44] [I] Enqueue Time
[06/17/2021-19:11:44] [I] min: 9.42017 ms
[06/17/2021-19:11:44] [I] max: 10.7423 ms
[06/17/2021-19:11:44] [I] median: 9.50635 ms
[06/17/2021-19:11:44] [I] GPU Compute
[06/17/2021-19:11:44] [I] min: 8.3761 ms
[06/17/2021-19:11:44] [I] max: 9.70874 ms
[06/17/2021-19:11:44] [I] mean: 8.55703 ms
[06/17/2021-19:11:44] [I] median: 8.45605 ms
[06/17/2021-19:11:44] [I] percentile: 9.62769 ms at 99%

Thanks,

NVES · June 25, 2021, 5:07am

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

spolisetty · June 25, 2021, 4:31pm

Hi @fcj,

Hope this will help you.

Could you please share complete verbose logs for both and if possible ONNX model to try from our end for better assistance.

Thank you.

fcj · July 3, 2021, 8:14pm

@spolisetty

You can download the onnx below;

https://drive.google.com/file/d/1s97AcGU1TVpSA9JkMPjGqhQ1kUDQL8u5/view?usp=sharing

Using the following command to convert int8 trt:
./trtexec --onnx=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.onnx --int8 --saveEngine=fcn_hr18_512x1024_160k_cityscapes_20200602_190822
-221e4a4f_int8.trt --dumpProfile --exportTimes=fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f_int8.json

The log can be downloaded below:
https://drive.google.com/file/d/1PmwsaIISSVeOt2DnPUJaxieEv4CUZ-fk/view?usp=sharing

spolisetty · July 5, 2021, 4:17pm

@fcj,

When we tried from our end observed little difference.
It’s possible if many layers end up falling back to fp32.
You’d probably want to enable both int8 and fp16 in such case.

We recommend you t please try on latest TensorRT 8.0 GA version.

Thank you.

Topic		Replies	Views
TRT Engin in INT8 is much slower than FP16 TensorRT	4	2100	November 11, 2021
Why is' int8 'not as fast as' fp16' TensorRT tensorrt	1	637	February 1, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1630	September 8, 2022
Same inference speed for INT8 and FP16 TensorRT	10	6273	October 12, 2021
Int8 is not faster than fp16 on xavier Jetson AGX Xavier tensorrt	5	856	October 18, 2021
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	210	October 11, 2024
Int8 performance is less than fp16 TensorRT tensorrt	3	939	September 2, 2022
YoloV4 slower in INT8 than FP16 TensorRT	5	1695	June 5, 2021
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2490	January 6, 2022
Yolov3 int8 on tensorrt 7.1.0.16 Jetson Xavier NX tensorrt	4	931	October 18, 2021

Little performance difference between int8 and fp16 on RTX2080

Related topics