Work with batch in TensorRT

v.stadnichuk · June 11, 2021, 11:18am

Description

Hello everyone,

I’m new in using TensorRT Python API.
Could you help me to migrate simple angle prediction model from Keras framework to TensorRT via ONNX? Now the main trouble is batch processing. My model has one dynamic input (batch of images). I created TRT engine with trtexec:

./trtexec --explicitBatch --onnx=apm_one_input.onnx --minShapes=input:1x64x64x3 --optShapes=input:20x64x64x3 --maxShapes=input:100x64x64x3 --saveEngine=apm_one_input.plan

On the inference step model works well with 1 image, but not with batch. Output buffer is empty. This command:

print(engine.get_binding_shape(0))
give this result:

(1, 64, 64, 3)
So engine allocate memory for only 1 image. Could you help?

Environment

TensorRT Version: 7.2.3.4
GPU Type: GeForce GTX 1060 6 GB
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.1
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.3.1

spolisetty · June 14, 2021, 9:35am

Hi @v.stadnichuk,

Could you please share issue reproducible inference script and model file for better assistance.

Thank you.

v.stadnichuk · June 14, 2021, 9:43am

Hi @spolisetty ,
I sent you scripts and model via private message.

spolisetty · June 14, 2021, 9:51am

Thank you @v.stadnichuk, we will look into it.

spolisetty · June 16, 2021, 1:23pm

Hi @v.stadnichuk,

We noticed that in trtexec command you’re using input node name wrongly. When we visualize onnx file using netron identified that it has to be input_1. Please try with updated trtexec command.

trtexec --explicitBatch --onnx=apm_one_input.onnx --minShapes=input_1:1x64x64x3 --optShapes=input_1:20x64x64x3 --maxShapes=input_1:100x64x64x3 --saveEngine=apm_one_input.plan

Thank you.

v.stadnichuk · June 16, 2021, 1:38pm

Hi @spolisetty !
Thank you for help! It works, but now I have output buffer with zero. Could you help?

spolisetty · June 16, 2021, 1:39pm

@v.stadnichuk,

Could you please share more details regarding issue you’re facing.

v.stadnichuk · June 16, 2021, 1:45pm

So, it’s dummy image:

img = np.random.rand(10, 64, 64, 3)
batch = img.shape[0] (it`s 10)

Buffer allocations:

h_input_1 = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(0)[1:]), dtype=trt.nptype(data_type))
h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

On the inference step I use

context.execute(batch_size=batch_size, bindings=[int(d_input_1), int(d_output)])

And as result I get buffer

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

You could find full code in apm_inf.py I sent you.
Thank you!

spolisetty · June 21, 2021, 4:17pm

@v.stadnichuk,
Please share us issue repro inference script with this input to try from our end.

v.stadnichuk · June 22, 2021, 7:30am

Hi @spolisetty ,
I sent you repo in private message.
Thank you!

spolisetty · June 28, 2021, 5:54pm

Hi @v.stadnichuk,

We could reproduce the issue, There seem to be a few different mistakes in the script,:
e.g. the engine only has two bindings, but here the script looks for the 3rd (index = 2) binding:

h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

And if this is an explicit batch engine, then the script should use one of the _v2 inference APIs instead of:

context.execute(batch_size=batch_size, bindings=[int(d_input_1), int(d_output)])

We recommend you to please explore and correct errors in the script.

Thank you.

v.stadnichuk · June 29, 2021, 10:03am

Hi @spolisetty !
Thanks for help!
I edited this one, that`s really was incorrect.

h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(2)[1:]), dtype=trt.nptype(data_type))

Also I used

context.execute_v2(bindings=[int(d_input_1), int(d_output)])

But I have the same empty output buffer. Could you help with it? I will share new script to you via private message.

spolisetty · July 2, 2021, 6:57pm

@v.stadnichuk,

sorry for the delayed response, are still facing this issue. Looks like we have new post related to this post TensorRT С++ optimization profile.

Thank you.

v.stadnichuk · July 5, 2021, 6:59am

Hi @spolisetty !
Yeah, this is new post related to this issue. I am working in parallel with C++ version. This issue is still valid. Thank you!

spolisetty · July 9, 2021, 6:47am

@v.stadnichuk,

In new script still looks like you are accessing the 3rd (non-existent) binding, please debug and try to fix.

print(engine.get_binding_shape(2))

We suggest you try Polygraphy for prototyping.
https://docs.nvidia.com/deeplearning/tensorrt/polygraphy/docs/index.html

The inference code would be something like:

from polygraphy.backend.common import BytesFromPath
from polygraphy.backend.trt import EngineFromBytes, TrtRunner
deserialize_engine = EngineFromBytes(BytesFromPath('/path/to/trt_engine'))
with TrtRunner(deserialize_engine) as runner:
     outputs = runner.infer({"input0_name": input0_nparray, "input1_name": input1_nparray})

Thank you.

v.stadnichuk · July 9, 2021, 11:06am

print(engine.get_binding_shape(2))

It used only for printing, but not used for buffer allocations.

We need to start this algorithm at Nvidia Drive AGX platform, that`s why we should use only TensorRT.
Can you help with it?

spolisetty · July 13, 2021, 7:08am

Even we print, we are trying to access non-existent binding.
Have you tried mentioned above ?
We recommend you to please try polygraph.

v.stadnichuk · July 13, 2021, 7:30am

I removed this string and I still have this issue:

[TensorRT] ERROR: Parameter check failed at: engine.cpp::resolveSlots::1318, condition: allInputDimensionsSpecified(routine)

I can’t use polygraph because I need to deploy this model on the Drive AGX Platform. As I know this platform does not support polygraph. That is why I need to use only TensorRT.

spolisetty · July 14, 2021, 6:09pm

@v.stadnichuk,

Polygraphy is supported on Jetson AGX. Looks like script has several errors, so I think it would be better for you to start off with Polygraphy and then write your own inference code once familiar with TRT, for example, we can read through Polygraphy’s inference implementation:

github.com

NVIDIA/TensorRT/blob/master/tools/Polygraphy/polygraphy/backend/trt/runner.py

#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import contextlib
import copy
import time
from collections import OrderedDict

This file has been truncated. show original

Thank you.

v.stadnichuk · July 15, 2021, 8:41am

@spolisetty

We have Nvidia Drive AGX, not Jetson AGX. And Drive AGX does not support Polygraphy. We checked it here:
Environment.txt (461 Bytes)

That’s why we use TensorRT and can you help with TensorRT script?

Also now batch processing is not important, so could you help with this issue (Deploy this script on the C++ TensorRT API)?

Topic		Replies	Views
Batch Inference Wrong in Python API TensorRT	15	3555	October 12, 2021
TensorRT С++ optimization profile TensorRT tensorrt , opencv , cuda	29	3078	September 9, 2021
Tensorrt Batch Inference TensorRT tensorrt	8	1573	December 1, 2020
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1570	October 20, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1095	December 13, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	969	September 29, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5412	June 29, 2022
Inference multiple images TensorRT TensorRT	8	2270	November 9, 2020
Setting the batch in TensorRT using CPP API TensorRT tensorrt	9	1309	January 24, 2025
Build engine error when use pointnet-like structure and TensorRT 8.0.1.6 TensorRT tensorrt	13	1671	January 14, 2022

Work with batch in TensorRT

Description

Environment

Related topics