Estimating convolution performance with mixed precision


I’m trying to understand how to properly estimate perf for convolutions factoring in the new FP16/INT8/INT4 capabilities.

a) Is it correct that Winograd is pretty much not applicable with INT8/4 or are there tricks implemented in TensorRT/cuDNN?

b) That seems to imply that FP16 Winograd will likely be faster than INT8 (say, both using Turing tensor cores) than direct convolution for 3x3 but for 1x1s it is better to use INT8/INT4?