How could I change the batchsize during inference when using a tensorRT model converted by onnx?

dsq0720 · January 16, 2020, 6:55am

Now, I am trying to convert an onnx model (a crnn model for ocr) to tensorRT. And I want to use dynamic shape.

I noticed that In TensorRT 7.0, the ONNX parser only supports full-dimensions mode, meaning that your network definition must be created with the explicitBatch flag set., so I add optimization profile as follow.

IOptimizationProfile* profile = builder->createOptimizationProfile();
    profile->setDimensions("data", OptProfileSelector::kMIN, Dims4(32, 1, 32, 320));
    profile->setDimensions("data", OptProfileSelector::kOPT, Dims4(32, 1, 32, 800));
    profile->setDimensions("data", OptProfileSelector::kMAX, Dims4(32, 1, 32, 1280));
    config->addOptimizationProfile(profile);

I can convert the model and do inference with batch size 32 successfully. But when I try to do inference with batch size less than 32, there would be en error Cuda failure 77 which means an illegal memory access was encountered. It looks like that I must use 32 as batch size during inference. So I want to know that can I change the batch size during inference. It’s very serious for my application.

On the other hand, I try to convert resnet with dynamic shape, and I can do inference with different batch size.

SunilJB · January 16, 2020, 9:17am

Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):

Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)

Thanks

dsq0720 · January 16, 2020, 9:46am

SunilJB:

Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):
Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)
Thanks

Thanks for your reply. I have changed it as your method. Now I can do inference with different batch successfully. I think the document and the sample code of sampleDynamicReshape are not clear enough.

dsq0720 · January 17, 2020, 12:30pm

SunilJB:

Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):
Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)
Thanks

There is another problem. I used one cpp script to convert onnx model to trt model, then I will load this trt model in my service. When loading this model by the function IRuntime.deserializeCudaEngine, the optimization profile bound on the original model doesn’t exist. I tried to setBindingDimensions before inference, there is an error as follow.

ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]

So the problem is how can I add optimization profile when create engine by IRuntime.deserializeCudaEngine.

dsq0720 · January 19, 2020, 8:15am

SunilJB:
Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):
Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)
Thanks
There is another problem. I used one cpp script to convert onnx model to trt model, then I will load this trt model in my service. When loading this model by the function IRuntime.deserializeCudaEngine, the optimization profile bound on the original model doesn’t exist. I tried to setBindingDimensions before inference, there is an error as follow.
ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]
So the problem is how can I add optimization profile when create engine by IRuntime.deserializeCudaEngine.

I have fixed it.

sv1 · February 2, 2020, 5:53pm

SunilJB:
Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):
Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)
Thanks
There is another problem. I used one cpp script to convert onnx model to trt model, then I will load this trt model in my service. When loading this model by the function IRuntime.deserializeCudaEngine, the optimization profile bound on the original model doesn’t exist. I tried to setBindingDimensions before inference, there is an error as follow.
ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]
So the problem is how can I add optimization profile when create engine by IRuntime.deserializeCudaEngine.
I have fixed it.

Hi dsq0720,

I am facing a similar error in setBindingDimensions(). Can you please let me know how did you fix the above error ?

dsq0720 · February 4, 2020, 4:16am

SunilJB:
Hi,

In this case you can try using optimization profiles.
You should be able to create an engine with many profiles optimized for either one specific batch size each, or for a range of batch sizes.
I believe the performance increases as the range decreases.

Eg:
Specific Batch Sizes (optimized for each batch size):
Profile 1: min=opt=max=(1, *input_shape)
Profile 2: min=opt=max=(8, *input_shape)
Profile 3: min=opt=max=(32, *input_shape)
Thanks
There is another problem. I used one cpp script to convert onnx model to trt model, then I will load this trt model in my service. When loading this model by the function IRuntime.deserializeCudaEngine, the optimization profile bound on the original model doesn’t exist. I tried to setBindingDimensions before inference, there is an error as follow.
ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::948, condition: profileMaxDims.d[i] >= dimensions.d[i]
So the problem is how can I add optimization profile when create engine by IRuntime.deserializeCudaEngine.
I have fixed it.
Hi dsq0720,

I am facing a similar error in setBindingDimensions(). Can you please let me know how did you fix the above error ?

Hi sv,

You need to set optimizationProfile before setting bindingDimensions, as follow

context_->setOptimizationProfile(0);
context_->setBindingDimensions(0, Dims4(max_batchsz_, 1, height, width));

prince.patel.14 · September 17, 2021, 6:29am

What will be the same equivalent code in python for setting optimization profile

I have set the shape in context

context.set_binding_shape(0, [batchsize,]+input_shape)

Topic		Replies	Views
How to use different profile in tensorrt? TensorRT tensorrt , python	3	1510	July 19, 2022
AGX Xavier dynamic batches Jetson AGX Xavier tensorrt	2	434	October 18, 2021
TensorRT 7 ONNX models with variable batch size TensorRT kb	13	12351	October 12, 2021
Why I cannot change the BatchSize (index) dimension for a network imported from ONNX format in TRT7.0 TensorRT	5	5461	April 13, 2020
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1138	October 20, 2023
Input batch size is smaller than TensorRT engine batch size TensorRT	1	1049	March 28, 2022
Dynamic batch size for tensorrt Engine TensorRT tensorrt	1	1823	May 30, 2024
Tensorrt Error when using dynamic batch: data: kMIN dimensions in profile 0 are [24,3,224,224] but input has static dimensions [48,3,224,224] TensorRT	3	2512	February 8, 2022
Creating a TensorRT Engine with different batch sizes TensorRT python , onnx	12	2965	August 18, 2020
How to perform batch inference with explicit batch? TensorRT	4	1945	October 27, 2021

How could I change the batchsize during inference when using a tensorRT model converted by onnx?

Related topics