Hello, I know after we create an engine for a specific input size, when the batchsize is set to larger, the per-image inference time is faster comparing with the common deep learning tools like tensorflow/caffe/pytorch. Why? Which advantage of Tensorrt get this good performance?
There are a number of optimizations done by TensorRT that allow TensorRT to outperform other frameworks. Some of these optimizations include precision calibration, mixed-precision, layer fusion, tensor fusion, kernel auto-tuning (picking the best CUDA kernels for your target device/platform), efficient GPU memory re-use, etc.
See this page for a higher level overview: https://developer.nvidia.com/tensorrt