[defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)


I use tensorrt to inference enigine file which is trans from onnx.
after I get result, there is only one error before process finished

[defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)

Do I need to free buffer manually?
here is my code, please help…

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def allocate_buffers(engine, batch_size=1):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    binding_to_type = {"input": np.float32, "probs": np.float32}
    for idx, binding in enumerate(engine):

        if binding in ['input', 'probs']:
            size = {'input': (batch_size, 3, 512, 512), 'probs': (batch_size,1, 2)}
            # size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
            dtype = binding_to_type[str(binding)]
            # Allocate host and device buffers
            host_mem = cuda.pagelocked_empty(size[binding], dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            # Append the device buffer to device bindings.
            # Append to the appropriate list.
            if engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
                outputs.append(HostDeviceMem(host_mem, device_mem))

    return inputs, outputs, bindings, stream

def do_inference_v2(context, bindings, inputs, outputs, stream):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    # Return only the host outputs.
    return [out.host for out in outputs]

class FDT():
    def __init__(self,weights='model.engine'):
        self.weights = weights
        self.imgsz = 512
        self.conf = 0.5
        self.batch_size = 1
        self.logger = trt.Logger(trt.Logger.WARNING)
        self.runtime = trt.Runtime(self.logger)
        trt.init_libnvinfer_plugins(None, "")
        with open(self.weights, 'rb') as f:
            self.engine = self.runtime.deserialize_cuda_engine(f.read())
            self.inputs, self.outputs, self.bindings, self.stream = allocate_buffers(self.engine,
        self.context = self.engine.create_execution_context()
        self.context.set_binding_shape(0, (1, 3, 512, 512))

    def infer_once(self, img):
        img = cv2.resize(img, (self.imgsz, self.imgsz))
        img = img.transpose((2, 0, 1))
        np.copyto(self.inputs[0].host, img)
        pred = do_inference_v2(self.context, self.bindings, self.inputs, self.outputs, self.stream)[0][0][0]
        return pred

if __name__ == "__main__":
    det = FDT(weights='model.engine')
    img  = cv2.imread('data/2.jpg')
    p = det.infer_once(img)


TensorRT Version: 8.0
CUDA version: 11.6


Are you using model with dynamic shapes ? could you use context.get_binding_shape for an engine with dynamic shape?
Could you please share with us ONNX model and complete error logs for better debugging.

Thank you.

I’m facing the exact same problem, and I didn’t use image nor dynamic shape. Furthermore, I tried only allocate_buffers and finished, then there would’t be any error. But as long as I executed engine.create_excution_context() and ended, the same error will appear.

CUDA 11.0 TensoRT

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging