Two inputs in TensorRT engine using python

TimeDilation · August 25, 2023, 5:39pm

Description

Generated engine file from onnx model using trtexec.
Model accepts two inputs, image feature and token sequence as it’s image captioning model.
I can see successful inference when load engine file with trtexec --loadEngine.

My concern is how I can do it with python.
Below is the inference code

def infer(engine):

input_image = preprocess(input_file)

sequence_token = token_preprocess()

with engine.create_execution_context() as context:

    # Set input shape based on image dimensions for inference
    context.set_binding_shape(0, (1, 4096))
    context.set_binding_shape(1, (1, 35))

    # Allocate host and device buffers
    bindings = []
    for binding in engine:
        binding_idx = engine.get_binding_index(binding)

        size = trt.volume(context.get_binding_shape(binding_idx))
        
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        
        if engine.binding_is_input(binding):

            if binding_idx == 0:

                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))

            else:
                input_buffer = np.ascontiguousarray(sequence_token)
                input_memory = cuda.mem_alloc(sequence_token.nbytes)
                bindings.append(int(input_memory))

        else:
            output_buffer = cuda.pagelocked_empty(8485, dtype)
            output_memory = cuda.mem_alloc(output_buffer.nbytes)
            bindings.append(int(output_memory))

    stream = cuda.Stream()
  
    # Transfer input data to the GPU.
    cuda.memcpy_htod_async(input_memory, input_buffer, stream)
    
    # Run inference
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)

    # Transfer prediction output from the GPU.
    cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)

    # Synchronize the stream
    stream.synchronize()

Getting following error when try to inference

[TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)

Environment

TensorRT Version: 8.6
CUDA Version: 12.0

AakankshaS · August 25, 2023, 6:07pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

bmmerales · November 4, 2023, 8:54am

Hi @TimeDilation have you resolved your issue? My model also needs to accept two image inputs.

Topic		Replies	Views
Sample code of feeding image into TensorTR inference engine TensorRT	9	1635	January 21, 2020
Test .engine generated with depstream avec tensort DeepStream SDK tensorrt , deepstream	5	152	January 22, 2025
Run engine trt file on image/video Jetson TX2 tensorrt	7	1732	July 8, 2020
Tensorrt inference in real time TensorRT tensorrt , python	1	675	March 13, 2023
Batch Inference Wrong in Python API TensorRT	14	3768	February 20, 2020
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	4268	July 20, 2021
API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::1480, condition: allInputDimensionsSpecified(routine) TensorRT tensorrt , cuda , natural-language-processing-nlp	6	11914	February 1, 2024
Converting ONNX Model(Which has two inputs) to TRT #1020 TensorRT tensorrt , pytorch , onnx	9	3213	January 19, 2021
Engine Plan Inference on JetsonTX2 Jetson TX2 tensorrt , python	10	2043	June 17, 2020
How to load tensorrt engine directly with building on runtime Jetson Nano tensorrt , onnx	3	2996	August 3, 2021

Two inputs in TensorRT engine using python

Description

Environment

check_model.py

Related topics