ValueError if engine batch size is unequal to number of inferenced images


Hi togehter,

I have a caffemodel which I parsed to TensorRT and this works fine on the same amount of images and batches. But I’m struggling when I create an engine with max_batch_size which is unequal to the images I’m passing, e.g. I want to infer 2 images, but the engine is created with max batch size = 4 I get an ValueError, because my input basically expects 4 images (batches)

Here are the code snippets:

Call engine routine:

if not os.path.exists(trt_engine_path):
        trt_engine = engine_utils.build_engine_caffe(deploy_file, model_file, TRT_LOGGER, trt_engine_datatype, batch_size)
        engine_utils.save_engine(trt_engine, trt_engine_path)

Build engine routine:

def build_engine_caffe(deploy_file, model_file, trt_logger, trt_engine_datatype, batch_size):
    with trt.Builder(trt_logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        builder.max_workspace_size = 1 << 30
        if trt_engine_datatype == trt.DataType.FLOAT:
            print("Using FP32 mode!")
        if trt_engine_datatype == trt.DataType.HALF:
            print("Using FP16 mode!")
            builder.fp16_mode = True

        builder.max_batch_size = batch_size
        model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=trt_engine_datatype)
        print("Building TensorRT engine. This may take few minutes.")
        return builder.build_cuda_engine(network)

Allocation of buffers:

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem): = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str( + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    binding_to_type = {"data": np.float32, "conv_reg": np.float32}
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        # dtype = trt.nptype(engine.get_binding_dtype(binding))
        dtype = binding_to_type[str(binding)]
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            outputs.append(HostDeviceMem(host_mem, device_mem))

    return inputs, outputs, bindings, stream

and it fails here at np.copyto, because if engine is create e.g. with bs=4, but only 2 images are passed I’ll get an ValueError:

def infer_batch(img_np, context, inputs, outputs, bindings, stream, batch_size):
    numpy_array = img_np
    actual_batch_size = len(img_np)
    np.copyto(inputs[0].host, numpy_array.ravel())

How can I overcome this issue?
When pass the same amount of images equal to the batch size I won’t get an error.
I had a look into the Python samples, but I couldn’t find an example for multiple batches at the same time.

Thanks a lot for your help!


Please refer below sample with multiple batch size using caffe parser: