How to evaluate tensorrt forward inference time-consuming through python api ?

Description

How to evaluate tensorrt forward inference time-consuming through python api ?

The following is the tensorrt forward inference code based on python api that I viewed on the official website. I want to specify how I evaluate the accurate algorithm forward inference time?

def do_inference(batch, context, bindings, inputs, outputs, stream):
    assert len(inputs) == 1
    inputs[0].host = np.ascontiguousarray(batch, dtype=np.float32)
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    stream.synchronize()

    outputs_dict = {}
    outputs_shape = {}
    for out in outputs:
        outputs_dict[out.binding_name] = np.reshape(out.host, out.shape)
        outputs_shape[out.binding_name] = out.shape

    return outputs_shape, outputs_dict

Reference link:
https://developer.nvidia.com/zh-cn/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/

Looking forward to your reply, thank you very much.

Environment

TensorRT Version: 8.2.1
GPU Type: 3060TI
Nvidia Driver Version: 510.60.02
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: ubuntu20.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable): no use
PyTorch Version (if applicable): 1.10.1+cu113
Baremetal or Container (if container which image + tag):

Hi,
Please check the below link, as they might answer your concerns

Thanks!