I found that TensorRT model wtih batch-size of 6 can be used to infer an input with batch-size less than such as 4. My question is that would it be more apporiate to use a TensorRT engine with batch-size of 4 to infer on a 4 batch input? What difference does it make?
My setup is the following:
NVIDIA GPU Driver Version 10.2