Work with batch in TensorRT


Hello everyone,

I’m new in using TensorRT Python API.
Could you help me to migrate simple angle prediction model from Keras framework to TensorRT via ONNX? Now the main trouble is batch processing. My model has one dynamic input (batch of images). I created TRT engine with trtexec:

./trtexec --explicitBatch --onnx=apm_one_input.onnx --minShapes=input:1x64x64x3 --optShapes=input:20x64x64x3 --maxShapes=input:100x64x64x3 --saveEngine=apm_one_input.plan

On the inference step model works well with 1 image, but not with batch. Output buffer is empty. This command:

give this result:

(1, 64, 64, 3)
So engine allocate memory for only 1 image. Could you help?


TensorRT Version:
GPU Type: GeForce GTX 1060 6 GB
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.1
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.3.1

Hi @v.stadnichuk,

Could you please share issue reproducible inference script and model file for better assistance.

Thank you.

Hi @spolisetty ,
I sent you scripts and model via private message.

Thank you @v.stadnichuk, we will look into it.

Hi @v.stadnichuk,

We noticed that in trtexec command you’re using input node name wrongly. When we visualize onnx file using netron identified that it has to be input_1. Please try with updated trtexec command.

trtexec --explicitBatch --onnx=apm_one_input.onnx --minShapes=input_1:1x64x64x3 --optShapes=input_1:20x64x64x3 --maxShapes=input_1:100x64x64x3 --saveEngine=apm_one_input.plan

Thank you.

Hi @spolisetty !
Thank you for help! It works, but now I have output buffer with zero. Could you help?


Could you please share more details regarding issue you’re facing.

So, it’s dummy image:

img = np.random.rand(10, 64, 64, 3)
batch = img.shape[0] (it`s 10)

Buffer allocations:

h_input_1 = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(0)[1:]), dtype=trt.nptype(data_type))
h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

On the inference step I use

context.execute(batch_size=batch_size, bindings=[int(d_input_1), int(d_output)])

And as result I get buffer

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

You could find full code in I sent you.
Thank you!

Please share us issue repro inference script with this input to try from our end.

Hi @spolisetty ,
I sent you repo in private message.
Thank you!

Hi @v.stadnichuk,

We could reproduce the issue, There seem to be a few different mistakes in the script,:
e.g. the engine only has two bindings, but here the script looks for the 3rd (index = 2) binding:

h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

And if this is an explicit batch engine, then the script should use one of the _v2 inference APIs instead of:

context.execute(batch_size=batch_size, bindings=[int(d_input_1), int(d_output)])

We recommend you to please explore and correct errors in the script.

Thank you.

Hi @spolisetty !
Thanks for help!
I edited this one, that`s really was incorrect.

h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

Also I used

context.execute_v2(bindings=[int(d_input_1), int(d_output)])

But I have the same empty output buffer. Could you help with it? I will share new script to you via private message.


sorry for the delayed response, are still facing this issue. Looks like we have new post related to this post TensorRT С++ optimization profile.

Thank you.

Hi @spolisetty !
Yeah, this is new post related to this issue. I am working in parallel with C++ version. This issue is still valid. Thank you!


In new script still looks like you are accessing the 3rd (non-existent) binding, please debug and try to fix.


We suggest you try Polygraphy for prototyping.

The inference code would be something like:

from polygraphy.backend.common import BytesFromPath
from polygraphy.backend.trt import EngineFromBytes, TrtRunner
deserialize_engine = EngineFromBytes(BytesFromPath('/path/to/trt_engine'))
with TrtRunner(deserialize_engine) as runner:
     outputs = runner.infer({"input0_name": input0_nparray, "input1_name": input1_nparray})

Thank you.


It used only for printing, but not used for buffer allocations.

We need to start this algorithm at Nvidia Drive AGX platform, that`s why we should use only TensorRT.
Can you help with it?

Even we print, we are trying to access non-existent binding.
Have you tried mentioned above ?
We recommend you to please try polygraph.

I removed this string and I still have this issue:

[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1318, condition: allInputDimensionsSpecified(routine)

I can’t use polygraph because I need to deploy this model on the Drive AGX Platform. As I know this platform does not support polygraph. That is why I need to use only TensorRT.


Polygraphy is supported on Jetson AGX. Looks like script has several errors, so I think it would be better for you to start off with Polygraphy and then write your own inference code once familiar with TRT, for example, we can read through Polygraphy’s inference implementation:

Thank you.


We have Nvidia Drive AGX, not Jetson AGX. And Drive AGX does not support Polygraphy. We checked it here:
Environment.txt (461 Bytes)

That’s why we use TensorRT and can you help with TensorRT script?

Also now batch processing is not important, so could you help with this issue (Deploy this script on the C++ TensorRT API)?