I’ve setup tensorRT to work on my yolov3 model where I’m running inference on each frame of a video stream. When I run with a single video stream and process each frame one at at time, I notice that the tensorRT version of the model gets a solid speedup over the regular model (going from 43 fps to 57 fps). However, when I try to process frames from larger batch sizes, like 5 different videos (and batch together 1 frame from each video into a batch size of 5), I don’t see any speedup with tensorRT.
I’m trying to understand why I see a speedup with batch size of 1 vs a batch size of 5. Any ideas why this might be happening or what I can look into for improving batch performance? I’m running with float 32 but would still expect a speedup for larger batch sizes for the tensorRT model.
Here is an outline of my steps for creating and running the tensorRT engine:
- Export yolo model to onnx using
torch.onnx.export
with the dynamic batches param
- Convert onnx to tensorRT engine
- parse onnx model
- create a single optimization profile for a specific batch size:
profile.set_shape(inp.name, min=(batch_size, *shape), opt=(batch_size, *shape), max=(batch_size, *shape))
- build engine
- Load the tensorRT engine + context
- select the right tensorRT engine based on input batch size to inference function
- Set the binding shape:
context.set_binding_shape(0, (BATCH_SIZE, 3, IMAGE_SIZE))
- Set the optimization profile:
context.active_optimization_profile = 0
Not sure if there’s anything else I should be doing but these steps seem to be fine for handling inference with larger batch sizes. I’m running this on the latest TensorRT 7 version with EXPLICIT_BATCH parameter set (seems like this is required) but I do have a dynamic shape for the batch size.
Is there anything I’m missing or worth trying to determine why this is happening?
Hi @prathikn,
Please share your model and script along with the below set of system information, so that we can help you better.
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version
Thanks!
I’m using a standard yolov3-spp model for predicting a single class, which you can see here: yolov3/models.py at c7f8dfcb8734d604482992b13d10420ea5eb3fd3 · ultralytics/yolov3 · GitHub
For generating the tensorRT engine, here is the script I use:
def create_optimization_profiles(builder, inputs, batch_size):
# Creates tensorRT optimizations profiles for a given batch size
profiles = []
for inp in inputs:
profile = builder.create_optimization_profile()
shape = inp.shape[1:]
profile.set_shape(inp.name, min=(batch_size, *shape), opt=(batch_size, *shape), max=(batch_size, *shape))
profiles.append(profile)
return profiles
def build_engine(onnx_file_path, engine_file_path, batch_size, verbose=True):
logger = trt.Logger(trt.Logger.VERBOSE) if verbose else trt.Logger()
builder = trt.Builder(logger)
config = builder.create_builder_config()
# Specifies that network should have an explicit batch size (required in tensorRT 7.0.0+)
explicit_batch = [1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)]
network = builder.create_network(*explicit_batch)
parser = trt.OnnxParser(network, logger)
# Define standard settings for tensorRT builder environment
builder.max_workspace_size = 1 << 30
builder.max_batch_size = batch_size
builder.fp16_mode = True
# builder.strict_type_constraints = True
# Parse onnx model
with open(onnx_file_path, 'rb') as onnx_model:
if not parser.parse(onnx_model.read()):
print("ERROR: Failed to parse onnx model.")
for error in range(parser.num_errors):
print(parser.get_error(error))
return
# Add optimization profiles
inputs = [network.get_input(i) for i in range(network.num_inputs)]
opt_profiles = create_optimization_profiles(builder, inputs, batch_size)
for profile in opt_profiles:
config.add_optimization_profile(profile)
# Explicitly set the the output layer so engine knows where to expect final outputs
last_layer = network.get_layer(network.num_layers - 1)
if not last_layer.get_output(0):
network.mark_output(last_layer.get_output(0))
print('Building tensorRT engine...')
engine = builder.build_engine(network, config)
print('Successfully built engine')
with open(engine_file_path, 'wb') as f:
f.write(engine.serialize())
Here is system information:
- Ubuntu 18.04.4 LTS, x86-64, Linux 4.15.0-101-generic
- GPU: GeForce RTX 2080 Ti
- CUDA version: 10.2
- CUDNN version: 7.6.5
- Python version: 3.7
- Pytorch: 1.5
- TensorRT: 7.0.0.11
Once this engine is created I simply load the engine, set the optimization profile and binding as described above and run the engine using context.execute_async
. Is there anything else I should be doing here?
Hi @AakankshaS, any update here on ideas/what to look into?
Hi @prathikn,
Sincere apologies for delayed response,
Are you still facing the issue?
I tried working with yolov3, and could not reproduce the issue.
Can you please help with your model?
Thanks!