Why I cannot change the BatchSize (index) dimension for a network imported from ONNX format in TRT7.0

52051 · February 6, 2020, 9:45pm

I am new to TensorRT, but I encounter this problem with TensorRT 7.0

(my rag: cuDNN 7.6.5/CUDA 10.2/Windows 10 x64, with Xeon v4 CPU and several Titan V GPUs).

In my case: the size of the input tensor of the ONNX model is 256(H)*1(W)*6(C)

Since in TensorRT 7.x, only dynamic shape mode is supported for ONNX networks, so I added an input layer according to the user guider with dynamic tensor definition:

int BatchSize=256;
network->addInput("foo", DataType::kFLOAT, Dims4(BatchSize, 6, -1, -1));

And adding optimized profile:

Dims dim; 
dim.d[0]=BatchSize;
dim.d[1]=6;
...

profile->setDimensions("foo", OptProfileSelector::kMIN, dim);
...

But whenever the engine is built, and at runtime it is called, the input dimension’s index batchsize is always 1, therefore the profile dimension does not match it, and the program will return the following debug information:

Parameter check failed at: engine>cpp::nvinfer1::rt:ExecutionContext::setBindingDimensions::948, condition:
profileMaxDims.d[i]>=dimensions.d[i]

The only case it worked is that, at runtime, I have to force to set the batch size to 1 and thus call the engine context to compute the input one at a time, and of cause the performance will be unacceptable.

Any idea on how to solve this problem?

NVES_R · February 6, 2020, 11:55pm

Hi,

Was your ONNX model created with a dynamic batch dimension? If not, it’s batch size is likely set to 1 (or the batch size of your dummy_input if exported through PyTorch for example like here: torch.onnx — PyTorch 1.12 documentation)

If your ONNX model was created with the dynamic batch dimension in the input, you should be able to create and use optimization profiles as expected. If the ONNX model has a fixed batch size, then you’ll likely encounter errors when trying to manually change the batch size like in your example above. Similarly, I don’t think pre-pending another input layer with a dynamic batch dimension will work as expected, because that won’t propogate through the network.

52051 · February 7, 2020, 12:16am

Thanks for the reply, in my test case, the network is exported by MATLAB R2019B, my guess is that the ONNX model exported by MATLAB is created with a dynamic batch dimension, and the MATLAB interface don’t offer any parameters to control the batch dimension, so my understanding is MATLAB’s exported ONNX model cannot work well with TensorRT yet?

NVES_R · February 7, 2020, 1:05am

You can view your ONNX model in Netron: Netron to easily verify if it has the dynamic batch dimension or not. If you see something like 1x256x1x6 next to the input node, then it’s fixed. If you see something like x256x1x6 then it’s dynamic.

I don’t know too much about Matlab’s capabilities, but from a quick glance at this page: Export network to ONNX model format - MATLAB exportONNXNetwork, it doesn’t look like they mention dynamic axes, and their opsets look a little behind too (looks like max of 9, but 11 is most recent).

52051 · February 9, 2020, 9:01am

Thanks for your reply, you are right, the onnx model exported by MATLAB has a fixed batch size of 1.

So I downloaded onnx (GitHub - onnx/onnx: Open standard for machine learning interoperability) and change the batch size from 1 to the one I prefer (e.g. 256), however I encountered another problem:

Having change the batch size from 1 to any other number>1, whenever I create the context through

IExecutionContext *context = engine->createExecutionContextWithoutDeviceMemory();
size_t SomeDeviceBufferSize = engine->getDeviceMemorySize();
...
context->setDeviceMemory(SomeDeviceBuffer);

The program always fail to work and returns the following debug information:

C:\source\rtSafe\cuda\caskConvolutionRunner.cpp (233) - Cuda Error in nvinfer1::rt::task::CaskConvolutionRunner::allocateContextResources: 1 (invalid argument)

And when I create the context through:

IExecutionContext *context = engine->createExecutionContext();

The program works just fine.

The SomeDeviceBuffer is returned by a cudaMalloc function call, so I think the pointer alignment should not be the cause of the problem?

NVES_R · April 13, 2020, 7:45pm

Hi,

When modifying an ONNX model’s batch size directly, you’ll likely have to modify it throughout the whole graph from input to output. Also, if the ONNX model contained any hard-coded shapes in intermediate layers for some reason, changing the batch size might not work correctly - so you’ll need to be careful of this. It’s generally preferred to export the model from the original framework to ONNX with a dynamic batch size. PyTorch/tf2onnx support this, I’m not sure about MATLAB.

Topic		Replies	Views
TensorRT 7 ONNX models with variable batch size TensorRT kb	13	12351	October 12, 2021
How could I change the batchsize during inference when using a tensorRT model converted by onnx? TensorRT	8	4819	October 12, 2021
Creating a TensorRT Engine with different batch sizes TensorRT python , onnx	12	2965	August 18, 2020
Question about maxbatchsize in TRT7 TensorRT tensorrt	4	1438	October 12, 2021
TensorRT : Does TensorRT interpret the first dimension of layers shapes as batch size? TensorRT tensorrt	2	928	October 12, 2021
AGX Xavier dynamic batches Jetson AGX Xavier tensorrt	2	434	October 18, 2021
How to support dynamic batch size for TensorRT engine? TensorRT	1	1178	March 3, 2023
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1138	October 20, 2023
Dynamic batch size TensorRT	3	4738	January 24, 2023
Batch Inference Wrong in Python API TensorRT	15	3687	October 12, 2021

Why I cannot change the BatchSize (index) dimension for a network imported from ONNX format in TRT7.0

Related topics