TensorRT for yolov2 caffemodel (engine, inferecing, pre/postprocessing)

I would like to use TensorRT to create an engine and do inferencing for an already trained Tiny yolov2 model. For this I also have the .prototxt, .caffemodel, label- and anchorfile. I followed the Nvidia Developer Tutorial (https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#import_caffe_python) as good as possible, but now I have three open questions which were not answered there:

  1. Regarding engine:
def build_engine(deploy_file, model_file, logger, engine_datatype, batch_size):
    print('Creating engine...')

    with trt.Builder(logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        builder.max_workspace_size = MAX_WORKSPACE_SIZE
        builder.max_batch_size = batch_size
        model_tensors = parser.parse(deploy_file, model_file, network, engine_datatype)
        return builder.build_cuda_engine(network)

Here I got the error that at least one Output Tensor has to be included - I just took the last layer of my .prototxt file, but I have also seen networks where several output layers are included - how do I know exactly which and how many I need?

  1. Regarding preprocessing of images:
def pre_process_img(img_path, host_buffer):
    print('processing image...')
    with Image.open(img_path) as img:
        c, h, w = INPUT_SHAPE #(3, 416, 416)
        dtype = trt.nptype(DTYPE) #trt.float32
        img_array = np.asarray(img.resize((w, h), Image.BILINEAR)).transpose([2, 0, 1]).astype(dtype).ravel()
        img_array /= 255.0 # 127.5 - 1.0
    np.copyto(host_buffer, img_array)

There was no example here in the Nvidia documentation, so I just followed the most relevant code examples I found on GitHub. An explanation how an image has to be pre-processed would also help me here.

  1. Regarding inferecing:
def allocate_buffers(engine):
    print('allocating buffers...')
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(DTYPE))
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(DTYPE))
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()
    return h_input, d_input, h_output, d_output, stream

def do_inference(context, h_input, d_input, h_output, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

Here I have taken the Nvidia example again, I also get an output, but can’t really interpret it. How can I get the label, the confidence and the coordinates from this array? I guess I need the label- and anchor file for this…?
My understanding of tiny yolov2 is, that I should get an output of 13x13x125, but somehow my output is an array of length 8450.

Unfortunately, the inference example in the documentation stops with the return of “h_output” and the samples which should be at /usr/src/tensorrt/samples/python/introductory_parser_samples are not there.
If you need more information, let me know.

Thanks a lot for your help!


  1. TRT engine supports multiple outputs, the API mark_output could be invoked multiple times if needed.

  2. Pre and post processing is not handled by TensorRT. User has to handle the it based on the his application.
    Only thing to consider is that output of pre-processing layer which acts as TRT input should have following data format:

  3. TensorRT outputs should conceptually match the original models outputs. For example, if the original model had 2 outputs representing 2 different things, then so should the TensorRT engine. The common difference however, is that the original model probably has an output of some shape, let’s say 2-D (N, K) where N is batch size and K is output size - but the TensorRT outputs are usually as a single flattened array, in this case maybe just 1-D (N*K)

Please refer below examples:
Both samples can be found at “/usr/src/tensorrt/samples/python/” or “/opt/tensorrt/samples/python/”


Thanks, everything is working now as expected and the speedup is impressive!