How to use different profile in tensorrt?

9304346 · July 18, 2022, 3:20am

Description

I want to use dyamic batchsize and shape in tensorrt.
I add two profile from onnx to engine, one profile is the batchsize=1, and the other batchsize=4, below is onnx to engine code:

def build_engine(onnx_path, using_half, batch_size=1, dynamic_input=True):
    trt.init_libnvinfer_plugins(None, '')
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_batch_size = 4 # always 1 for explicit batch
        builder.max_workspace_size = GiB(1)*5
        config = builder.create_builder_config()
        config.max_workspace_size = GiB(1)*5
        if using_half:
            config.set_flag(trt.BuilderFlag.FP16)

        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(onnx_path, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None

        if dynamic_input:
            profile = builder.create_optimization_profile()
            profile.set_shape(network.get_input(0).name, min=(1,1,32,10),opt=(1,1,32,420),max=(1,1,32,1000))
            config.add_optimization_profile(profile)


            profile1 = builder.create_optimization_profile()
            profile1.set_shape(network.get_input(0).name, min=(4,1,32,10),opt=(4,1,32,420),max=(4,1,32,1000))
            config.add_optimization_profile(profile1)
        
        return builder.build_engine(network, config)

when inference with “profile”,which batchsize=1, I set:

context = engine.create_execution_context
context.active_optimization_profile = 0
context.set_binding_shape(0, img.shape) # img.shape=(1, 1, 32, 208)
cuda.memcpy_htod_async(d_input, img, self.stream)
self.context.execute_async_v2(bindings=bindings, stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)

Everything is fine.

But when I want to use the “profile1”, which batchsize =4,

context = engine.create_execution_context
context.active_optimization_profile = 1
context.set_binding_shape(2, img.shape) # img.shape=(4, 1, 32, 208)
cuda.memcpy_htod_async(d_input, img, self.stream)
self.context.execute_async_v2(bindings=bindings, stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)

it show:
[TensorRT] INFO: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

[TensorRT] ERROR: myelin/myelinRunner.cpp (372) - Myelin Error in execute: 68 (myelinCudaError : CUDA error 700 enqueueing async copy.
)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
0%| | 0/3059 [00:01<?, ?it/s]
Traceback (most recent call last):
File “crnn_pth2engine/crnn_trt_batch.py”, line 164, in
preds,length, t_predict = crnn_handle.predict(img,batch_size)
File “crnn_pth2engine/crnn_trt_batch.py”, line 102, in predict
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: an illegal memory access was encountered

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Environment

TensorRT Version: TensorRT-7.2.3.4
GPU Type: T4
Nvidia Driver Version: 460.73.01
CUDA Version: 10.2
CUDNN Version: 8.4.0
Operating System + Version: centos7
Python Version (if applicable): python3.6.13
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): 1.6
Baremetal or Container (if container which image + tag): none

How can I use “profile1” to do inference? if I need to set current profile is “profile1”, and how?
Thank you for your help!

NVES · July 18, 2022, 4:07am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

9304346 · July 18, 2022, 5:17am

thank you for your help. I reference the code:

github.com

Zack0617/inference_results_v1.1/blob/afc47bcbb459494574b5f888475694fc0d30bfa7/open/Dell/code/common/runner.py

# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import sys
sys.path.insert(0, os.getcwd())

import ctypes
import numpy as np

This file has been truncated. show original

I find solution, that is:
bindings need to be:

bindings = [0 for i in range(engine.num_bindings)]

that is you need to assign the every element for bidings, I have 4 elements,
So I need to set bindings = [0,0,0,0]
and change the relative bindings elements in above list, if I need to use “profile 1” with batchsize=4.
I need to set bindings = [0, 0, int(d_input), int(d_output)]

if I need to predict with “profile” with batchsize=1
I need to set bindings = [ int(d_input), int(d_output), 0, 0]

Thanks!

spolisetty · July 19, 2022, 11:22am

Hi,

Are you still facing this issue? If not can we mark this as solved?
Looks like you’re using an old version of the TensorRT.

We recommend you to please use the latest TensorRT version 8.4 GA for a better experience.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

Topic		Replies	Views
Batch Inference Wrong in Python API TensorRT	15	3549	October 12, 2021
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5336	June 29, 2022
How could I change the batchsize during inference when using a tensorRT model converted by onnx? TensorRT	8	4659	October 12, 2021
Dynamic batch size for tensorrt Engine TensorRT tensorrt	1	1058	May 30, 2024
[TensorRT] ERROR: input: dynamic input is missing dimensions in profile 0 TensorRT	11	6969	October 12, 2021
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	944	September 29, 2022
I can't get result from TensorRT model TensorRT tensorrt	8	1003	May 31, 2022
Multiple tensorrt engine contexts for different models TensorRT	3	1859	March 16, 2023
TensorRT Batch Inference: different results TensorRT	4	4201	December 1, 2021
Onnx with dynamic batch cannot be parsed TensorRT tensorrt	12	1519	August 9, 2021

How to use different profile in tensorrt?

Description

PyCUDA ERROR: The context stack was not empty upon module cleanup.

Environment

check_model.py

Related topics