Hi,
I would like to use TensorRT to create an engine and do inferencing for an already trained Tiny yolov2 model. For this I also have the .prototxt, .caffemodel, label- and anchorfile. I followed the Nvidia Developer Tutorial (Developer Guide :: NVIDIA Deep Learning TensorRT Documentation) as good as possible, but now I have three open questions which were not answered there:
- Regarding engine:
def build_engine(deploy_file, model_file, logger, engine_datatype, batch_size):
print('Creating engine...')
with trt.Builder(logger) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
builder.max_workspace_size = MAX_WORKSPACE_SIZE
builder.max_batch_size = batch_size
model_tensors = parser.parse(deploy_file, model_file, network, engine_datatype)
network.mark_output(model_tensors.find('conv_reg'))
return builder.build_cuda_engine(network)
Here I got the error that at least one Output Tensor has to be included - I just took the last layer of my .prototxt file, but I have also seen networks where several output layers are included - how do I know exactly which and how many I need?
- Regarding preprocessing of images:
def pre_process_img(img_path, host_buffer):
print('processing image...')
with Image.open(img_path) as img:
c, h, w = INPUT_SHAPE #(3, 416, 416)
dtype = trt.nptype(DTYPE) #trt.float32
img_array = np.asarray(img.resize((w, h), Image.BILINEAR)).transpose([2, 0, 1]).astype(dtype).ravel()
img_array /= 255.0 # 127.5 - 1.0
np.copyto(host_buffer, img_array)
There was no example here in the Nvidia documentation, so I just followed the most relevant code examples I found on GitHub. An explanation how an image has to be pre-processed would also help me here.
- Regarding inferecing:
def allocate_buffers(engine):
print('allocating buffers...')
h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(DTYPE))
h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(DTYPE))
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
stream = cuda.Stream()
return h_input, d_input, h_output, d_output, stream
def do_inference(context, h_input, d_input, h_output, d_output, stream):
cuda.memcpy_htod_async(d_input, h_input, stream)
context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()
return h_output
Here I have taken the Nvidia example again, I also get an output, but can’t really interpret it. How can I get the label, the confidence and the coordinates from this array? I guess I need the label- and anchor file for this…?
My understanding of tiny yolov2 is, that I should get an output of 13x13x125, but somehow my output is an array of length 8450.
Unfortunately, the inference example in the documentation stops with the return of “h_output” and the samples which should be at /usr/src/tensorrt/samples/python/introductory_parser_samples
are not there.
If you need more information, let me know.
Thanks a lot for your help!