Why I cannot change the BatchSize (index) dimension for a network imported from ONNX format in TRT7.0

I am new to TensorRT, but I encounter this problem with TensorRT 7.0

(my rag: cuDNN 7.6.5/CUDA 10.2/Windows 10 x64, with Xeon v4 CPU and several Titan V GPUs).

In my case: the size of the input tensor of the ONNX model is 256(H)*1(W)*6©

Since in TensorRT 7.x, only dynamic shape mode is supported for ONNX networks, so I added an input layer according to the user guider with dynamic tensor definition:

int BatchSize=256;
network->addInput("foo", DataType::kFLOAT, Dims4(BatchSize, 6, -1, -1));

And adding optimized profile:

Dims dim; 
dim.d[0]=BatchSize;
dim.d[1]=6;
...

profile->setDimensions("foo", OptProfileSelector::kMIN, dim);
...

But whenever the engine is built, and at runtime it is called, the input dimension’s index batchsize is always 1, therefore the profile dimension does not match it, and the program will return the following debug information:

Parameter check failed at: engine>cpp::nvinfer1::rt:ExecutionContext::setBindingDimensions::948, condition:
profileMaxDims.d[i]>=dimensions.d[i]

The only case it worked is that, at runtime, I have to force to set the batch size to 1 and thus call the engine context to compute the input one at a time, and of cause the performance will be unacceptable.

Any idea on how to solve this problem?

Hi,

Was your ONNX model created with a dynamic batch dimension? If not, it’s batch size is likely set to 1 (or the batch size of your dummy_input if exported through PyTorch for example like here: https://pytorch.org/docs/stable/onnx.html#example-end-to-end-alexnet-from-pytorch-to-onnx)

If your ONNX model was created with the dynamic batch dimension in the input, you should be able to create and use optimization profiles as expected. If the ONNX model has a fixed batch size, then you’ll likely encounter errors when trying to manually change the batch size like in your example above. Similarly, I don’t think pre-pending another input layer with a dynamic batch dimension will work as expected, because that won’t propogate through the network.

Thanks for the reply, in my test case, the network is exported by MATLAB R2019B, my guess is that the ONNX model exported by MATLAB is created with a dynamic batch dimension, and the MATLAB interface don’t offer any parameters to control the batch dimension, so my understanding is MATLAB’s exported ONNX model cannot work well with TensorRT yet?

You can view your ONNX model in Netron: https://lutzroeder.github.io/netron/ to easily verify if it has the dynamic batch dimension or not. If you see something like 1x256x1x6 next to the input node, then it’s fixed. If you see something like x256x1x6 then it’s dynamic.

I don’t know too much about Matlab’s capabilities, but from a quick glance at this page: https://www.mathworks.com/help/deeplearning/ref/exportonnxnetwork.html, it doesn’t look like they mention dynamic axes, and their opsets look a little behind too (looks like max of 9, but 11 is most recent).

Thanks for your reply, you are right, the onnx model exported by MATLAB has a fixed batch size of 1.

So I downloaded onnx (https://github.com/onnx/onnx) and change the batch size from 1 to the one I prefer (e.g. 256), however I encountered another problem:

Having change the batch size from 1 to any other number>1, whenever I create the context through

IExecutionContext *context = engine->createExecutionContextWithoutDeviceMemory();
size_t SomeDeviceBufferSize = engine->getDeviceMemorySize();
...
context->setDeviceMemory(SomeDeviceBuffer);

The program always fail to work and returns the following debug information:

C:\source\rtSafe\cuda\caskConvolutionRunner.cpp (233) - Cuda Error in nvinfer1::rt::task::CaskConvolutionRunner::allocateContextResources: 1 (invalid argument)

And when I create the context through:

IExecutionContext *context = engine->createExecutionContext();

The program works just fine.

The SomeDeviceBuffer is returned by a cudaMalloc function call, so I think the pointer alignment should not be the cause of the problem?

Hi,

When modifying an ONNX model’s batch size directly, you’ll likely have to modify it throughout the whole graph from input to output. Also, if the ONNX model contained any hard-coded shapes in intermediate layers for some reason, changing the batch size might not work correctly - so you’ll need to be careful of this. It’s generally preferred to export the model from the original framework to ONNX with a dynamic batch size. PyTorch/tf2onnx support this, I’m not sure about MATLAB.