Description
How to evaluate tensorrt forward inference time-consuming through python api ?
The following is the tensorrt forward inference code based on python api that I viewed on the official website. I want to specify how I evaluate the accurate algorithm forward inference time?
def do_inference(batch, context, bindings, inputs, outputs, stream):
assert len(inputs) == 1
inputs[0].host = np.ascontiguousarray(batch, dtype=np.float32)
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
outputs_dict = {}
outputs_shape = {}
for out in outputs:
outputs_dict[out.binding_name] = np.reshape(out.host, out.shape)
outputs_shape[out.binding_name] = out.shape
return outputs_shape, outputs_dict
Reference link:
https://developer.nvidia.com/zh-cn/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/
Looking forward to your reply, thank you very much.
Environment
TensorRT Version: 8.2.1
GPU Type: 3060TI
Nvidia Driver Version: 510.60.02
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: ubuntu20.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable): no use
PyTorch Version (if applicable): 1.10.1+cu113
Baremetal or Container (if container which image + tag):