I am trying to implement dynamic shapes with trt7 using onnx models.I want to implement dynamic shapes for variable batch sizes(bs=4 and bs=8).I’ve exported onnx from pytorch as -
dummy_input = torch.randn((8,3,224,224))
dynamic_axes = {"input":{0:"batch_size"}, "output":{0:"batch_size"}}
torch.onnx.export(model, dummy_input,
"onnx_dynamic.onnx",
verbose=True, input_names=input_names,
output_names=output_names,dynamic_axes=dynamic_axes)
Once the model is exported,I am following the documentation to create an engine from this onnx model that can handle dynamic batches.
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
builder.max_batch_size = 8
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30
config.set_flag(trt.BuilderFlag.FP16)
profile1 = builder.create_optimization_profile()
profile1.set_shape('input', # input tensor name
(8, 3, 224, 224), # min shape
(8, 3, 224, 224), # opt shape
(8, 3, 224, 224))
config.add_optimization_profile(profile1)
profile2 = builder.create_optimization_profile()
profile2.set_shape('input',
(4, 3, 224, 224), # min shape
(4, 3, 224, 224), # opt shape
(4, 3, 224, 224))
config.add_optimization_profile(profile2)
with open(onnx_path,'rb') as model:
parser.parse(model.read())
engine = builder.build_engine(network,config)
After creating this trt engine,according to documentation, engine.get_binding_shape(0)
should return (-1,3,224,224) but instead it returns (8, 3, 224, 224).For sanity check that the network was build with dynamic shapes,network.get_input(0).shape
returns (-1,3,224,224).
Using this engine for inference,it always infers with batch size 8,regardless of switching optimization profile-
images = np.random.rand(4,3,224,224).astype(np.float32)
engine = get_trt_engine("trt16_dynamic.trt")
print(engine.get_binding_shape(0)) #prints (8,3,224,224) Why?
context = engine.create_execution_context()
#switching optimization profile to 1.i.e. bs=4
context.active_optimization_profile = 1
print(context.get_binding_shape(0)) #prints (8,3,224,224) Why?
inputs,outputs,bindings,stream = allocate_buffers(engine)
print(len(inputs[0].host)) #prints 8*3*224*224
inputs[0].host = images
output = do_inference(context,bindings,inputs,outputs,stream)
print(len(output[0])) #prints 168 as num classes = 21.So 21*8(bs)
Me switching optimization profile has no effect on batch size.Always only the optimization profile which was added first(bs 8 in this case) persists regardless of switching profiles.Also exported engine should have -1 in 0th dimension in its binding shape according to documentation which does not happen.
Am I missing something here?Any help would be appreciated.