How to use different profile in tensorrt?

Description

I want to use dyamic batchsize and shape in tensorrt.
I add two profile from onnx to engine, one profile is the batchsize=1, and the other batchsize=4, below is onnx to engine code:

def build_engine(onnx_path, using_half, batch_size=1, dynamic_input=True):
    trt.init_libnvinfer_plugins(None, '')
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_batch_size = 4 # always 1 for explicit batch
        builder.max_workspace_size = GiB(1)*5
        config = builder.create_builder_config()
        config.max_workspace_size = GiB(1)*5
        if using_half:
            config.set_flag(trt.BuilderFlag.FP16)

        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(onnx_path, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None

        if dynamic_input:
            profile = builder.create_optimization_profile()
            profile.set_shape(network.get_input(0).name, min=(1,1,32,10),opt=(1,1,32,420),max=(1,1,32,1000))
            config.add_optimization_profile(profile)


            profile1 = builder.create_optimization_profile()
            profile1.set_shape(network.get_input(0).name, min=(4,1,32,10),opt=(4,1,32,420),max=(4,1,32,1000))
            config.add_optimization_profile(profile1)
        
        return builder.build_engine(network, config)

when inference with “profile”,which batchsize=1, I set:

context = engine.create_execution_context
context.active_optimization_profile = 0
context.set_binding_shape(0, img.shape) # img.shape=(1, 1, 32, 208)
cuda.memcpy_htod_async(d_input, img, self.stream)
self.context.execute_async_v2(bindings=bindings, stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)

Everything is fine.

But when I want to use the “profile1”, which batchsize =4,

context = engine.create_execution_context
context.active_optimization_profile = 1
context.set_binding_shape(2, img.shape) # img.shape=(4, 1, 32, 208)
cuda.memcpy_htod_async(d_input, img, self.stream)
self.context.execute_async_v2(bindings=bindings, stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)

it show:
[TensorRT] INFO: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

[TensorRT] ERROR: myelin/myelinRunner.cpp (372) - Myelin Error in execute: 68 (myelinCudaError : CUDA error 700 enqueueing async copy.
)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
0%| | 0/3059 [00:01<?, ?it/s]
Traceback (most recent call last):
File “crnn_pth2engine/crnn_trt_batch.py”, line 164, in
preds,length, t_predict = crnn_handle.predict(img,batch_size)
File “crnn_pth2engine/crnn_trt_batch.py”, line 102, in predict
cuda.memcpy_dtoh_async(outputs, d_output, self.stream)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: an illegal memory access was encountered

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Environment

TensorRT Version: TensorRT-7.2.3.4
GPU Type: T4
Nvidia Driver Version: 460.73.01
CUDA Version: 10.2
CUDNN Version: 8.4.0
Operating System + Version: centos7
Python Version (if applicable): python3.6.13
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): 1.6
Baremetal or Container (if container which image + tag): none

How can I use “profile1” to do inference? if I need to set current profile is “profile1”, and how?
Thank you for your help!

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

1 Like

thank you for your help. I reference the code:

I find solution, that is:
bindings need to be:

bindings = [0 for i in range(engine.num_bindings)]

that is you need to assign the every element for bidings, I have 4 elements,
So I need to set bindings = [0,0,0,0]
and change the relative bindings elements in above list, if I need to use “profile 1” with batchsize=4.
I need to set bindings = [0, 0, int(d_input), int(d_output)]

if I need to predict with “profile” with batchsize=1
I need to set bindings = [ int(d_input), int(d_output), 0, 0]

Thanks!

Hi,

Are you still facing this issue? If not can we mark this as solved?
Looks like you’re using an old version of the TensorRT.

We recommend you to please use the latest TensorRT version 8.4 GA for a better experience.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.