I have been training a Yolov3 model in Pytorch and converting it to an onnx file to run with TensorRT. I’ve noticed some scenarios of different performance between the Pytorch model and the TensorRT model and I’m wondering what are the pros and cons of TensorRT compared to other compilers such as TVM?
Convolution: TensorRT implement many algorithms for both fp32 and int8 convolution, tvm only implement direct and winograd convolution and it requires almost 1 day to find fast conv config in a server.
Deconvolution: TensorRT has full support, TVM’s deconv don’t support group, don’t support int8.
Quantization: TensorRT has full post-training quantization support, open-sourced TVM quantization is incomplete.
- open source.
- after 1 day tuning in a server, tuned model may a little faster than tensorrt.
- still some bugs.
I’m not familiar with other compilers.