Two inputs in TensorRT engine using python

Description

Generated engine file from onnx model using trtexec.
Model accepts two inputs, image feature and token sequence as it’s image captioning model.
I can see successful inference when load engine file with trtexec --loadEngine.

My concern is how I can do it with python.
Below is the inference code

def infer(engine):

input_image = preprocess(input_file)

sequence_token = token_preprocess()

with engine.create_execution_context() as context:

    # Set input shape based on image dimensions for inference
    context.set_binding_shape(0, (1, 4096))
    context.set_binding_shape(1, (1, 35))

    # Allocate host and device buffers
    bindings = []
    for binding in engine:
        binding_idx = engine.get_binding_index(binding)

        size = trt.volume(context.get_binding_shape(binding_idx))
        
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        
        if engine.binding_is_input(binding):

            if binding_idx == 0:

                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))

            else:
                input_buffer = np.ascontiguousarray(sequence_token)
                input_memory = cuda.mem_alloc(sequence_token.nbytes)
                bindings.append(int(input_memory))

        else:
            output_buffer = cuda.pagelocked_empty(8485, dtype)
            output_memory = cuda.mem_alloc(output_buffer.nbytes)
            bindings.append(int(output_memory))

    stream = cuda.Stream()
  
    # Transfer input data to the GPU.
    cuda.memcpy_htod_async(input_memory, input_buffer, stream)
    
    # Run inference
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)

    # Transfer prediction output from the GPU.
    cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)

    # Synchronize the stream
    stream.synchronize()

Getting following error when try to inference

[TRT] [E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (invalid argument)

Environment

TensorRT Version: 8.6
CUDA Version: 12.0

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi @TimeDilation have you resolved your issue? My model also needs to accept two image inputs.