TensorRT 7 ONNX models with variable batch size

Hi,

I was previously using TRT5. My application was using different batch size (1,2,3,4 or 5) depending on a configuration parameter.

I am now migrating to TRT 7. I read in multiple forums that the batch size must be explicit when parsing ONNX models in TRT7.

How should I solve this using TRT7? Should I have different engines for each batch size?

Hi,

You can use optimization profiles, please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-developer-guide/index.html#opt_profiles

Thanks

I tried using an optimization profile as suggested
nvinfer1::IOptimizationProfile * profiles = {
builder->createOptimizationProfile()
};
profiles[0]->setShapeValues(“img”, nvinfer1::OptProfileSelector::kMIN, (const int){1, 3, 384, 1280}, 4);
profiles[0]->setShapeValues(“img”, nvinfer1::OptProfileSelector::kOPT, (const int){3, 3, 384, 1280}, 4);
profiles[0]->setShapeValues(“img”, nvinfer1::OptProfileSelector::kMAX, (const int){5, 3, 384, 1280}, 4);
for (auto profile_pt: profiles)
{
config->addOptimizationProfile(profile_pt);
}

When I run execute with batch size 3, I get this error:

[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.
[E] [TRT] engine.cpp (637) - Cuda Error in reportTimes: 77 (an illegal memory access was encountered)
[E] [TRT] INTERNAL_ERROR: std::exception
[E] [TRT] engine.cpp (902) - Cuda Error in executeInternal: 77 (an illegal memory access was encountered)
[E] [TRT] FAILED_EXECUTION: std::exception
mContext->execute – failed to execute tensorRT context

Hi,

Please refer to below sample in case it helps:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L138

Thanks

Hi,
I tried following. So far unsuccessful. Anyway it is very disappointing that something this basic requires this much hacking. Please educate me if I do everything wrong.

  • export from pytorch with explicit batch size
  • modify onnx model to replace all batch dimensions with -1 (Python with onnx library)
  • comment ModelImporter.cpp first line ASSERT(!_importer_ctx.network()->hasImplicitBatchDimension()
  • recompile libonnxparser
  • create network without kExplicitBatch
  • I get following error:
    [E] [TRT] Parameter check failed at: …/builder/Network.cpp::addInput::957, condition: isValidDims(dims, hasImplicitBatchDimension())
    ModelImporter::parseWithWeightDescriptor(): after importModel
    ERROR: img:226 In function importInput:
    *[8] Assertion failed: tensor = ctx->network()->addInput(input.name().c_str(), trtDtype, trt_dims)
    [W] parseFromFile failed
  • create network with kExplicitBatch
  • modify network input with TensorRT API to replace batch dimension to -1, all tensors get modified as a consequene automatically, but hasImplicitBatchSize remains false
  • when I run execute(batchSize) I get error that network has implicit batch size, please use executeV2
  • hasImplicitBatchSize is 0, due to how I created the network, it did not help that I changed tensor dimensions in the network

Last possible solution is to export different ONNX models for each batch size I need and create different engines for each batch size.

Hi,

I get error that network has implicit batch size

If you correctly specified the explicit batch flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126, then you shouldn’t be getting this error. Please do share the full error if you are.

please use executeV2

This is correct. If creating a network with the explicit batch flag (which is required for ONNX models), you should be using executeV2 (which doesn’t require a batchSize parameter since it’s explicit in the model).

If your explicit batch network has fixed shape (batch size >= 1), then you should be able to just use executeV2() similar to how you used execute() in previous TensorRT versions.

If your explicit batch network has dynamic shape (batch size == -1), (which it does in this case), then you need to create an optimization profile for it as you’ve described above. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input, such as something like this for an input of shape (4, 3, 384, 1280):

context->setOptimizationProfile(0);  // 0 is the first profile, 1 is the second profile, etc.
context->setBindingShape(0, Dims3(4, 3, 384, 1280));  // 0 is the first input binding, you may have multiple input bindings
context->executeV2(...)

See a dynamic shape sample here: https://github.com/NVIDIA/TensorRT/blob/572d54f91791448c015e74a4f1d6923b77b79795/samples/opensource/sampleDynamicReshape/README.md#running-inference

See more info in the docs here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation


export from pytorch with explicit batch size
modify onnx model to replace all batch dimensions with -1 (Python with onnx library)

Can you clarify what you mean by “export from pytorch with explicit batch size”. Do you mean dynamic axes? I have a minimal example of exporting Alexnet from PyTorch with dynamic batch size here: alexnet_onnx.py · GitHub

If you specified dynamic shape when exporting to ONNX with pytorch, you shouldn’t have to modify the onnx model to have -1 batch dimension after exporting, it should already be -1 if exported correctly. (or something like “unk” or “unknown” when viewing your model in a tool like Netron: Netron)


Here’s an example of how you’d parse and create an engine with roughly your sample optimization profile above using trtexec on the alexnet model for simplicity:

trtexec --explicitBatch --onnx=alexnet_dynamic.onnx \
--minShapes=actual_input_1:1x3x224x224 \  # kMIN shape
--optShapes=actual_input_1:3x3x224x224 \  # kOPT shape
--maxShapes=actual_input_1:5x3x224x224 \  # kMAX shape
--shapes=actual_input_1:3x3x224x224 \     # Inference shape - this is like context->setBindingShape(3,3,224,224)
--saveEngine=alexnet_dynamic.engine

You can also load and test other input shapes within the range of the optimization profile
using a saved engine with trtexec:

trtexec --loadEngine=alexnet_dynamic.engine \
--shapes=actual_input_1:5x3x224x224       # Inference shape - this is like context->setBindingShape(5,3,224,224)

There’s some more info on trtexec on this page: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec


Altogether I did something like this:

# Start TensorRT 7 container
nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorrt:20.02-py3

# Create ONNX model with dynamic batch dimension
pip install torch==1.4 torchvision
wget https://gist.githubusercontent.com/rmccorm4/b72abac18aed6be4c1725db18eba4930/raw/3919c883b97a231877b454dae695fe074a1acdff/alexnet_onnx.py
python alexnet_onnx.py

# Parse ONNX model, create a TensorRT engine with an optimization profile, do some inference, and save the engine
trtexec --explicitBatch --onnx=alexnet_dynamic.onnx \
--minShapes=actual_input_1:1x3x224x224 \
--optShapes=actual_input_1:3x3x224x224 \
--maxShapes=actual_input_1:5x3x224x224 \
--shapes=actual_input_1:3x3x224x224 \
--saveEngine=alexnet_dynamic.engine

# Load our saved engine and try inference with a different shape
trtexec --loadEngine=alexnet_dynamic.engine \
--shapes=actual_input_1:5x3x224x224

Hope this helps.

5 Likes

Thanks a lot! That helped solve all my problems.

Basically my workflow ended up being this:

  • export from Pytorch with all dimensions fixed (all you can do with torch.onny_export)
  • read in ONNX model in TensorRT (explicitBatch true)
  • change batch dimension for input to -1, this propagates throughout the network
  • modify all my custom plugins to be IPluginV2DynamicExt
  • set the optimizationprofile as described
  • use
    mContext->setOptimizationProfile(0); // 0 is the first profile, 1 is the second profile, etc.
    mContext->setBindingDimensions(0, Dims4{batchSize, 3, 384, 1280}); // 0 is the first input binding, you may have multiple input bindings

I had overlooked the setbindingDimensions previously.

Thanks a lot for the excellent help.

Hi @eascheiber,

I’m glad it helped!

  • export from Pytorch with all dimensions fixed (all you can do with torch.onny_export)
  • read in ONNX model in TensorRT (explicitBatch true)
  • change batch dimension for input to -1, this propagates throughout the network

I just want to point out that you can export from PyTorch with dynamic dimension using the dynamic_axes argument to torch.onnx.export.

Changing the batch size of the ONNX model manually after exporting it is not guaranteed to always work, in the event the model contains some hard coded shapes that are incompatible with your manual change.

See this snippet for an example of exporting with dynamic batch size: alexnet_onnx.py · GitHub

Also, if your plugins aren’t proprietary or anything, it may be very helpful to contribute a PR to the Github repo here: TensorRT/plugin at master · NVIDIA/TensorRT · GitHub

I’m sure it could help other users as well!

1 Like

Hi, I generated engine using trtexec, it’s with dyamic batch size.

How should I load the engine and inference with dynamic batch size.

Hi,

I also want to ask what is the way to use ONNX models with variable batch size.

As far as I understand the proposed solution is to use an explicit batch size, and change the batch size via optimiziation profile?

Unfortunately that is not a solution for me. I need to quickly adapt to diffrent batch sizes at each call of execute / enqueue , so an implicit batch size is needed.

Why it is not possible to use onnx models with implicit batch size?

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi, I want to create more than one optimization profile to run multiple execution contexts in parallel, can I use the trtexec tool to build this engine? I can create an opt profile using --minShapes, --optShapes and --maxShapes, but I dont know how to create another one. Thanks