How could I change the batchsize during inference when using a tensorRT model converted by onnx?

Now, I am trying to convert an onnx model (a crnn model for ocr) to tensorRT. And I want to use dynamic shape.

I noticed that In TensorRT 7.0, the ONNX parser only supports full-dimensions mode, meaning that your network definition must be created with the explicitBatch flag set., so I add optimization profile as follow.

IOptimizationProfile* profile = builder->createOptimizationProfile();
    profile->setDimensions("data", OptProfileSelector::kMIN, Dims4(32, 1, 32, 320));
    profile->setDimensions("data", OptProfileSelector::kOPT, Dims4(32, 1, 32, 800));
    profile->setDimensions("data", OptProfileSelector::kMAX, Dims4(32, 1, 32, 1280));
    config->addOptimizationProfile(profile);

I can convert the model and do inference with batch size 32 successfully. But when I try to do inference with batch size less than 32, there would be en error Cuda failure 77 which means an illegal memory access was encountered. It looks like that I must use 32 as batch size during inference. So I want to know that can I change the batch size during inference. It’s very serious for my application.

On the other hand, I try to convert resnet with dynamic shape, and I can do inference with different batch size.

Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):

Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)

Thanks

Thanks for your reply. I have changed it as your method. Now I can do inference with different batch successfully. I think the document and the sample code of sampleDynamicReshape are not clear enough.

There is another problem. I used one cpp script to convert onnx model to trt model, then I will load this trt model in my service. When loading this model by the function IRuntime.deserializeCudaEngine, the optimization profile bound on the original model doesn’t exist. I tried to setBindingDimensions before inference, there is an error as follow.

ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]

So the problem is how can I add optimization profile when create engine by IRuntime.deserializeCudaEngine.

I have fixed it.

Hi dsq0720,

I am facing a similar error in setBindingDimensions(). Can you please let me know how did you fix the above error ?

Hi sv,

You need to set optimizationProfile before setting bindingDimensions, as follow

context_->setOptimizationProfile(0);
context_->setBindingDimensions(0, Dims4(max_batchsz_, 1, height, width));