Description
I am current measuring batching throughput improvement on Jetson Nano. I run MobileNet V2 using both PyTorch and TensorRT. I expect a better throughput improvement in TensorRT than that of Pytorch. However, when I run my experiment, I get the following Result
PyTorch:
Batch size | Inference time(ms) | Throughput(fps) | Improvement |
---|---|---|---|
1 | 38.82 | 25.76 | 1 |
8 | 161.93 | 49.40 | 1.92 |
TensorRT
Batch size | Inference time(ms) | Throughput(fps) | Improvement |
---|---|---|---|
1 | 13.02 | 76.8 | 1 |
8 | 90.37 | 88.52 | 1.15 |
I have attached the scripts to yield these results. It seems that TensorRT has little throughput improvement when doing batching. Did I do something wrong?
Environment
TensorRT Version: 7.1.3
GPU Type: Maxwell
Nvidia Driver Version: L4T 32.4.3
CUDA Version: 10.02
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.6
Baremetal or Container (if container which image + tag):
Relevant Files
Script used to measure the throughput in PyTorch.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import torch
import time
from torchvision import models
ITER = 1000
def run_inference(model, img):
""" Randomly do some inference """
inputs = img.cuda()
res = model(inputs)
return res.cpu()
def main():
# Get model
mobilenet = models.mobilenet_v2(pretrained=True).float().eval().cuda()
batch_size = 8
input_shape = [batch_size, 3, 224, 224]
# Warm up
for i in range(10):
img = torch.rand(input_shape).float()
run_inference(mobilenet, img)
# Record inference time
inference_time = []
for i in range(ITER):
img = torch.rand(input_shape).float()
start_t = time.time()
res = run_inference(mobilenet, img)
delta_t = (time.time() - start_t) * 1000
inference_time.append(delta_t)
avg_inference_time = np.mean(inference_time)
throughput = 1000 * batch_size / avg_inference_time
pattern = "{:20}: {:.2f}"
print(pattern.format("Avg. inference time", avg_inference_time))
print(pattern.format("Throughput", throughput))
if __name__ == '__main__':
main()
Script used to export Mobilenet V2:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import torch
from torchvision import models
# Load model
model = models.mobilenet_v2(pretrained=True)
# export
batch_size = 8
inputs = torch.rand([batch_size, 3, 224, 224])
torch.onnx.export(model, # model to be exported
inputs, # Input to run the model for tracing
"mobilenet_v2_bs%d.onnx" % batch_size, # Output path
export_params=True, # If export parameters
input_names=["input"], # Input fields
output_names=["output"]) # output fields
Script used to measure performance in TensorRT:
trtexec --onnx=mobilenet_v2_bs8.onnx --iterations=1000
Steps To Reproduce
To get the measurement of PyTorch, just run the script directly.
To get the measurement of TensorRT. First export the model to .onnx
format from PyTorch. Then run it with trtexec
.