Description
Hi togehter,
I have a caffemodel which I parsed to TensorRT and this works fine on the same amount of images and batches. But I’m struggling when I create an engine with max_batch_size which is unequal to the images I’m passing, e.g. I want to infer 2 images, but the engine is created with max batch size = 4 I get an ValueError, because my input basically expects 4 images (batches)
Here are the code snippets:
Call engine routine:
if not os.path.exists(trt_engine_path):
trt_engine = engine_utils.build_engine_caffe(deploy_file, model_file, TRT_LOGGER, trt_engine_datatype, batch_size)
engine_utils.save_engine(trt_engine, trt_engine_path)
Build engine routine:
def build_engine_caffe(deploy_file, model_file, trt_logger, trt_engine_datatype, batch_size):
with trt.Builder(trt_logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
builder.max_workspace_size = 1 << 30
if trt_engine_datatype == trt.DataType.FLOAT:
print("Using FP32 mode!")
if trt_engine_datatype == trt.DataType.HALF:
print("Using FP16 mode!")
builder.fp16_mode = True
builder.max_batch_size = batch_size
model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=trt_engine_datatype)
network.mark_output(model_tensors.find(ModelData.OUTPUT_NAME))
print("Building TensorRT engine. This may take few minutes.")
return builder.build_cuda_engine(network)
Allocation of buffers:
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
binding_to_type = {"data": np.float32, "conv_reg": np.float32}
for binding in engine:
print("binding:",binding)
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
# dtype = trt.nptype(engine.get_binding_dtype(binding))
dtype = binding_to_type[str(binding)]
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
and it fails here at np.copyto, because if engine is create e.g. with bs=4, but only 2 images are passed I’ll get an ValueError:
def infer_batch(img_np, context, inputs, outputs, bindings, stream, batch_size):
numpy_array = img_np
actual_batch_size = len(img_np)
np.copyto(inputs[0].host, numpy_array.ravel())
#[...]
How can I overcome this issue?
When pass the same amount of images equal to the batch size I won’t get an error.
I had a look into the Python samples, but I couldn’t find an example for multiple batches at the same time.
Thanks a lot for your help!