Estimating convolution performance with mixed precision

asp77 · August 28, 2018, 7:28pm

Hi!

I’m trying to understand how to properly estimate perf for convolutions factoring in the new FP16/INT8/INT4 capabilities.

a) Is it correct that Winograd is pretty much not applicable with INT8/4 or are there tricks implemented in TensorRT/cuDNN?

b) That seems to imply that FP16 Winograd will likely be faster than INT8 (say, both using Turing tensor cores) than direct convolution for 3x3 but for 1x1s it is better to use INT8/INT4?

Thanks!

Topic		Replies	Views
Estimating convolution performance with mixed precision TensorRT	2	1369	October 12, 2021
Cudnn convolution performance by precision DRIVE AGX Xavier General driveos-cuda	6	1197	May 30, 2022
Cudnn convolution performance(fp32, fp16. int8) on the jetson xavier cuDNN	3	1093	June 14, 2022
Tensor WMMA INT8 vs FP16 processing speed Deep Learning (Training & Inference) mixed-precision	1	2014	February 14, 2019
Int8 is 30% slower than fp16 in cudnn_samples_v8/conv_sample cuDNN	4	870	February 8, 2023
Why is' int8 'not as fast as' fp16' TensorRT tensorrt	1	655	February 1, 2021
YoloV4 slower in INT8 than FP16 TensorRT	5	1799	June 5, 2021
Same inference speed for INT8 and FP16 TensorRT	10	6414	October 12, 2021
Performance regression of conv2d INT8 on cudnn 8 cuDNN	2	827	January 14, 2022
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1663	September 8, 2022

Estimating convolution performance with mixed precision

Related topics