case one:
for …
result_list = yolov4_wrapper.check_person(frame)
time_cur_detect = time.time() - st_time
print(‘num = {} --------------------------------------detect_time(ms): {}’.format(num, time_cur_detect * 1000.0))
results
num = 0 --------------------------------------detect_time(ms): 86.98105812072754
…
num = 1210 --------------------------------------detect_time(ms): 40.47203063964844
case two:
for …
result_list = yolov4_wrapper.check_person(frame)
time_cur_detect = time.time() - st_time
print(‘num = {} --------------------------------------detect_time(ms): {}’.format(num, time_cur_detect * 1000.0))
for i in range(10000000):
pass
results
num = 0 --------------------------------------detect_time(ms): 85.25681495666504
…
num = 60 --------------------------------------detect_time(ms): 76.00593566894531
question
if the post processing is time consuming,then the TensorRT inference time will increase, so how to
solve this problem?
Environment
TensorRT Version: TensorRT-7.1.3.4 GPU Type: Tesla T4 Nvidia Driver Version: Driver Version: 450.80.02 CUDA Version: CUDA Version: 11.0 CUDNN Version: Operating System + Version: ubuntu1604 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
i have test TensorRT-7.1.3.4/samples/python/yolov3_onnx/, code snippet
with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
total_time = 0
exec_times = 100
for _ in range(exec_times):
############################################################################################################
print('Running inference on image {}...'.format(input_image_path))
start0 = time.time()
image_raw, image = preprocessor.process(input_image_path)
# Store the shape of the original input image in WH format, we will need it for later
shape_orig_WH = image_raw.size
# Output shapes expected by the post-processor
output_shapes = [(1, 255, 19, 19), (1, 255, 38, 38), (1, 255, 76, 76)]
print("===> preprocessor time(TRT): %.5f(ms)" % ((time.time() - start0) * 1000.0))
############################################################################################################
start1 = time.time()
# Do inference with TensorRT
trt_outputs = []
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
# Do inference
# Set host input to the image. The common.do_inference function will copy the input to the GPU before executing.
inputs[0].host = image
trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
print("===> inference time(TRT): %.5f(ms)" % ((time.time() - start1) * 1000.0))
############################################################################################################
total_time += (time.time() - start0)*1000.0
print("===> total inference time(TRT): %.5f(ms)" % ((time.time() - start0)*1000.0))
# post for process
# for i in range(10000000):
# pass
print('average processing time: %.5f(ms)' % (total_time / exec_times))