I would like to use TensorRT to create an engine and do inferencing for an already trained Tiny yolov2 model. For this I also have the .prototxt, .caffemodel, label- and anchorfile. I followed the Nvidia Developer Tutorial (https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#import_caffe_python) as good as possible, but now I have three open questions which were not answered there:
- Regarding engine:
def build_engine(deploy_file, model_file, logger, engine_datatype, batch_size): print('Creating engine...') with trt.Builder(logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser: builder.max_workspace_size = MAX_WORKSPACE_SIZE builder.max_batch_size = batch_size model_tensors = parser.parse(deploy_file, model_file, network, engine_datatype) network.mark_output(model_tensors.find('conv_reg')) return builder.build_cuda_engine(network)
Here I got the error that at least one Output Tensor has to be included - I just took the last layer of my .prototxt file, but I have also seen networks where several output layers are included - how do I know exactly which and how many I need?
- Regarding preprocessing of images:
def pre_process_img(img_path, host_buffer): print('processing image...') with Image.open(img_path) as img: c, h, w = INPUT_SHAPE #(3, 416, 416) dtype = trt.nptype(DTYPE) #trt.float32 img_array = np.asarray(img.resize((w, h), Image.BILINEAR)).transpose([2, 0, 1]).astype(dtype).ravel() img_array /= 255.0 # 127.5 - 1.0 np.copyto(host_buffer, img_array)
There was no example here in the Nvidia documentation, so I just followed the most relevant code examples I found on GitHub. An explanation how an image has to be pre-processed would also help me here.
- Regarding inferecing:
def allocate_buffers(engine): print('allocating buffers...') h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(DTYPE)) h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(DTYPE)) d_input = cuda.mem_alloc(h_input.nbytes) d_output = cuda.mem_alloc(h_output.nbytes) stream = cuda.Stream() return h_input, d_input, h_output, d_output, stream def do_inference(context, h_input, d_input, h_output, d_output, stream): cuda.memcpy_htod_async(d_input, h_input, stream) context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle) cuda.memcpy_dtoh_async(h_output, d_output, stream) stream.synchronize() return h_output
Here I have taken the Nvidia example again, I also get an output, but can’t really interpret it. How can I get the label, the confidence and the coordinates from this array? I guess I need the label- and anchor file for this…?
My understanding of tiny yolov2 is, that I should get an output of 13x13x125, but somehow my output is an array of length 8450.
Unfortunately, the inference example in the documentation stops with the return of “h_output” and the samples which should be at
/usr/src/tensorrt/samples/python/introductory_parser_samples are not there.
If you need more information, let me know.
Thanks a lot for your help!