Does Tensor RT contain any optimization to implement convolutions efficiently, such as lower the convolutions into matrix multiplication as cudnn?
Another question about horizontal layer fusion improves performance. By saying that horizontal layer fusion improves performance by combining layers that take the same source tensor and apply the same operations with similar parameters, resulting in a single larger layer for higher computational efficiency. However, the num of output feature maps maintains the same before and after horizontal layer fusion and bth can be done in parallel. Could you please offer more details about this? Thanks.