Inference multiple images TensorRT

y14uc339 · June 22, 2020, 9:58am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version - 7.0:
GPU Type - RTX 2080:
Nvidia Driver Version → 440.64.00:
CUDA Version - 10.2:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
**Baremetal or Container (if container which image + tag) → nvcr.io/nvidia/tensorrt:20.03-py3 **:

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

I can successfully inference a single image, but as soon as I loop through a list of images the output of the first image is copied in the output of other images. Below is the related code:

# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

# This function is generalized for multiple inputs/outputs.
# inputs and outputs are expected to be lists of HostDeviceMem objects.
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]


TRT_LOGGER = trt.Logger()


def get_engine(engine_path):
    # If a serialized engine exists, use it instead of building an engine.
    print("Reading engine from file {}".format(engine_path))
    with open(engine_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())


def build_engine(onnx_path, using_half, engine_file="yolov5_1_fp32_common.engine"):
    if os.path.exists(engine_file):
        return get_engine(engine_file)
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_batch_size = 1 # always 1 for explicit batch
        config = builder.create_builder_config()
        config.max_workspace_size = GiB(1)
        if using_half:
            config.set_flag(trt.BuilderFlag.FP16)
        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(onnx_path, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None
        return builder.build_engine(network, config)

def detect_yolo(engine, context, buffers, image_src, image_size):
    IN_IMAGE_H, IN_IMAGE_W = 640, 640
    dataset = LoadImages(image_src, img_size=image_size)

    for path, img, im0s, vid_cap in dataset:
        print(path)
        input_img = img.astype(np.float)
        input_img /= 255.0
        input_img = np.expand_dims(input_img, axis=0)

        img = torch.from_numpy(input_img).float().numpy()
        # print(img.shape)
        trt_output = detect(engine, context, buffers, img)


def detect(engine, context, buffers, img_in):
    ta = time.time()
    print("Shape of the network input: ", img_in.shape)
    # print(img_in)
    inputs, outputs, bindings, stream = buffers
    print('Length of inputs: ', len(inputs))
    inputs[0].host = img_in
    print(outputs[-1])
    trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

    print('Len of outputs: ', len(trt_outputs))
    num_classes = 80
    # print(trt_outputs)
    trt_output = trt_outputs[0].reshape(1, -1, 5 + num_classes)

    tb = time.time()

    print(trt_output.shape)

    print('-----------------------------------')
    print('    TRT inference time: %f' % (tb - ta))
    print('-----------------------------------')
    return trt_output


def main():
    """
    """
    using_half = False
    with build_engine(opt.onnx, False) as engine, engine.create_execution_context() as context:
        buffers = allocate_buffers(engine)

        detect_yolo(engine, context, buffers, opt.source, opt.img_size)

Thanks in advance. I want to inference multiple images successfully. I think this has something to do with context.enqueue. I am not quite sure

AakankshaS · June 22, 2020, 10:51am

Hi @y14uc339,
TRT >= 7 requires EXPLICIT_BATCH for ONNX, for fixed-shape model, the batch size is fixed.
You may refer the link for the same.

However, please share your model file to reproduce the issue so we can help better.
Thanks!

y14uc339 · June 22, 2020, 11:00am

@AakankshaS
trtexec --onnx=onnx_file --explicitBatch --saveEngine=save file
Used the above command to create engine.
I think I used explicitBatch, but when I inference on multiple images in a loop it copies the output of first image in all those follow, created the onnx with batch size 1

I can share the engine file if you want?

AakankshaS · June 22, 2020, 12:21pm

Hi @y14uc339, kindly share the engine file.
Thanks!

y14uc339 · June 22, 2020, 12:29pm

Hi @AakankshaS, Please find the engine file:

y14uc339 · June 22, 2020, 1:36pm

@AakankshaS Also I see this in the Terminal after the first image is inferenced. FO rall images that follow:

[TensorRT] WARNING: Explicit batch network detected and batch size specified, use 
enqueue without batch size instead.
[TensorRT] ERROR: ../rtExt/cuda/slice.cu (141) - Cuda Error in launchNaiveSliceImpl: 
400 (invalid resource handle)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

y14uc339 · June 23, 2020, 4:42am

@AakankshaS is it a bug or something because I cannot really use dynamic shapes with onnx as onnx-simplifier doesn’t support dynamic shapes! But multiple images inference shouldn’t be a problem

AakankshaS · June 30, 2020, 6:53am

Hi,
Apologies for late response.
Can you please share your onnx model.

Thanks!

2725433662 · November 9, 2020, 12:43pm

any progress here? Same issue for me

Topic		Replies	Views
Tensorrt Batch Inference TensorRT tensorrt	8	1562	December 1, 2020
How can I access the same TensorRT engine model in different thread TensorRT cudnn	1	541	November 27, 2023
Tensorrt inference on multiple batches TensorRT tensorrt , jetson-inference	5	2859	October 27, 2022
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	56	December 2, 2024
Batch Inference Wrong in Python API TensorRT	15	3538	October 12, 2021
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3760	July 20, 2021
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1062	December 13, 2022
CUDNN_STATUS_BAD_PARAM when infer with dynamic shape TensorRT	4	1494	June 25, 2021
Engine Plan Inference on JetsonTX2 Jetson TX2 tensorrt , python	11	1835	October 18, 2021
ONNX model and TensorRT engine works differently TensorRT	5	706	February 20, 2023

Inference multiple images TensorRT

Description

Environment

Relevant Files

Steps To Reproduce

Related topics