TensorRT for yolov2 caffemodel (engine, inferecing, pre/postprocessing)

marving1 · April 11, 2020, 5:00pm

Hi,
I would like to use TensorRT to create an engine and do inferencing for an already trained Tiny yolov2 model. For this I also have the .prototxt, .caffemodel, label- and anchorfile. I followed the Nvidia Developer Tutorial (Developer Guide :: NVIDIA Deep Learning TensorRT Documentation) as good as possible, but now I have three open questions which were not answered there:

Regarding engine:

def build_engine(deploy_file, model_file, logger, engine_datatype, batch_size):
    print('Creating engine...')

    with trt.Builder(logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
        builder.max_workspace_size = MAX_WORKSPACE_SIZE
        builder.max_batch_size = batch_size
        model_tensors = parser.parse(deploy_file, model_file, network, engine_datatype)
        network.mark_output(model_tensors.find('conv_reg'))
        return builder.build_cuda_engine(network)

Here I got the error that at least one Output Tensor has to be included - I just took the last layer of my .prototxt file, but I have also seen networks where several output layers are included - how do I know exactly which and how many I need?

Regarding preprocessing of images:

def pre_process_img(img_path, host_buffer):
    print('processing image...')
    with Image.open(img_path) as img:
        c, h, w = INPUT_SHAPE #(3, 416, 416)
        dtype = trt.nptype(DTYPE) #trt.float32
        img_array = np.asarray(img.resize((w, h), Image.BILINEAR)).transpose([2, 0, 1]).astype(dtype).ravel()
        img_array /= 255.0 # 127.5 - 1.0
    np.copyto(host_buffer, img_array)

There was no example here in the Nvidia documentation, so I just followed the most relevant code examples I found on GitHub. An explanation how an image has to be pre-processed would also help me here.

Regarding inferecing:

def allocate_buffers(engine):
    print('allocating buffers...')
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(DTYPE))
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(DTYPE))
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()
    return h_input, d_input, h_output, d_output, stream

def do_inference(context, h_input, d_input, h_output, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    stream.synchronize()
    return h_output

Here I have taken the Nvidia example again, I also get an output, but can’t really interpret it. How can I get the label, the confidence and the coordinates from this array? I guess I need the label- and anchor file for this…?
My understanding of tiny yolov2 is, that I should get an output of 13x13x125, but somehow my output is an array of length 8450.

Unfortunately, the inference example in the documentation stops with the return of “h_output” and the samples which should be at /usr/src/tensorrt/samples/python/introductory_parser_samples are not there.
If you need more information, let me know.

Thanks a lot for your help!

SunilJB · April 13, 2020, 9:19am

Hi,

TRT engine supports multiple outputs, the API mark_output could be invoked multiple times if needed.
Pre and post processing is not handled by TensorRT. User has to handle the it based on the his application.
Only thing to consider is that output of pre-processing layer which acts as TRT input should have following data format:
TensorRT Developer Guide :: NVIDIA Deep Learning SDK Documentation
TensorRT outputs should conceptually match the original models outputs. For example, if the original model had 2 outputs representing 2 different things, then so should the TensorRT engine. The common difference however, is that the original model probably has an output of some shape, let’s say 2-D (N, K) where N is batch size and K is output size - but the TensorRT outputs are usually as a single flattened array, in this case maybe just 1-D (N*K)

Please refer below examples:
Both samples can be found at “/usr/src/tensorrt/samples/python/” or “/opt/tensorrt/samples/python/”

Thanks

marving1 · April 15, 2020, 1:39pm

Thanks, everything is working now as expected and the speedup is impressive!

Topic		Replies	Views
Tensorrt Batch Inference TensorRT tensorrt	8	1570	December 1, 2020
TensorRT Inference error on Jetson nano TensorRT	3	1186	December 6, 2021
Error while running resnet10.caffemodel_b1_int8.engine file with Tensorrt TensorRT	3	422	April 13, 2020
Falure to do inference TAO Toolkit tensorrt	9	1071	January 11, 2022
TensorRT Engine Creation with Resnet50: [TensorRT] ERROR: resources.cpp (199) - Cuda Error in gieCudaMalloc: 2 CUDA Programming and Performance	4	2043	April 9, 2018
Sample code of feeding image into TensorTR inference engine TensorRT	9	1477	January 21, 2020
Cannot use TensorRT model exported by NVIDIA TAO TAO Toolkit	8	1131	May 17, 2022
How to evaluate .engine model on custom dataset? DeepStream SDK	12	1005	May 24, 2023
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1534	October 20, 2022
Can TensorRT do inference in a child thread ? TensorRT	6	2202	August 11, 2020

TensorRT for yolov2 caffemodel (engine, inferecing, pre/postprocessing)

Related topics