❓ Question
Why TensorRT model is slower? I have tried TensorRT in an MHA (multihead attention) model, but found it is even slower than the jit scripted model.
What you have already tried
I tested the original model, the jit scripted model, the jit model after optimization, and the TensorRT model. Then, I found the tensorrt model is not as fast as I expected. The model here is a simple MHA module modified from fairseq
so it could pass the compilation.
import time
import tmp_attn
import torch
import tensorrt
import torch_tensorrt as torch_trt
def timer(m, i):
st = time.time()
for _ in range(10000):
m(i, i, i)
ed = time.time()
return ed - st
t1 = torch.randn(64, 1, 1280, device="cuda:0")
model = tmp_attn.MultiheadAttention(1280, 8).to("cuda:0")
model2 = torch.jit.script(model)
model3 = torch.jit.optimize_for_inference(model2)
model4 = torch_trt.compile(model, inputs=[t1, t1, t1]).to("cuda:0")
print("Original Model", timer(model, t1))
print("Jit Script Model", timer(model2, t1))
print("Jit Script Model after optimization", timer(model3, t1))
print("TensorRT Model", timer(model4, t1))
I ran these models 10000 times and record the spent time.
The output is:
Original Model 5.6981117725372314
Jit Script Model 4.5694739818573
Jit Script Model after optimization 3.3332810401916504
TensorRT Model 4.772718667984009
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): 1.11.0
- CPU Architecture: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
- OS (e.g., Linux): Linux, CentOS7
- How you installed PyTorch (
conda
,pip
,libtorch
, source): conda - Build command you used (if compiling from source): /
- Are you using local sources or building from archives: No
- Python version: 3.7
- CUDA version: 11.7
- GPU models and configuration:
- TensorRT version: 8.2.5.1
- Torch_tensorrt version: 1.1.0
Additional context
The code of MHA is here.
tmp_attn.py