Hi!
I’m trying to understand how to properly estimate perf for convolutions factoring in the new FP16/INT8/INT4 capabilities.
a) Is it correct that Winograd is pretty much not applicable with INT8/4 or are there tricks implemented in TensorRT/cuDNN? (therefore Winograd can be estimated as cost of filter transform plus fused A GEMM AT)
b) That seems to imply that FP16 Winograd will likely be faster than INT8 (say, both using Turing tensor cores) than direct convolution for 3x3 but for 1x1s it is better to use INT8/INT4?
Thanks!