I’ve setup tensorRT to work on my yolov3 model where I’m running inference on each frame of a video stream. When I run with a single video stream and process each frame one at at time, I notice that the tensorRT version of the model gets a solid speedup over the regular model (going from 43 fps to 57 fps). However, when I try to process frames from larger batch sizes, like 5 different videos (and batch together 1 frame from each video into a batch size of 5), I don’t see any speedup with tensorRT.
I’m trying to understand why I see a speedup with batch size of 1 vs a batch size of 5. Any ideas why this might be happening or what I can look into for improving batch performance? I’m running with float 32 but would still expect a speedup for larger batch sizes for the tensorRT model.
Here is an outline of my steps for creating and running the tensorRT engine:
- Export yolo model to onnx using
torch.onnx.exportwith the dynamic batches param
- Convert onnx to tensorRT engine
- parse onnx model
- create a single optimization profile for a specific batch size:
profile.set_shape(inp.name, min=(batch_size, *shape), opt=(batch_size, *shape), max=(batch_size, *shape))
- build engine
- Load the tensorRT engine + context
- select the right tensorRT engine based on input batch size to inference function
- Set the binding shape:
context.set_binding_shape(0, (BATCH_SIZE, 3, IMAGE_SIZE))
- Set the optimization profile:
context.active_optimization_profile = 0
Not sure if there’s anything else I should be doing but these steps seem to be fine for handling inference with larger batch sizes. I’m running this on the latest TensorRT 7 version with EXPLICIT_BATCH parameter set (seems like this is required) but I do have a dynamic shape for the batch size.
Is there anything I’m missing or worth trying to determine why this is happening?