I inference model by provider TensorrtExecutionProvider of ONNXRuntime.
First it convert model ONNX to Engine
Then it execute inference model

I have 3 questions:
Question 1: The inference time for first time is too slow. How can I fix it ???. As I know maybe I can save .profile and load it before inference.
Question 2: How can I save and load file .engine instead of reconverting to engine after each inference.
Question 3: I can not set trt_profile_max_shapes (it said not support) for inference. Because when I give a large input dataloader it raise error mylin and limit of shape.
Besides, Any support for ONNX IOBinding with TensorrtExecutionProvider ???

Thanks a lot for your help <3


TensorRT Version: 8.2.3
GPU Type: A10
Nvidia Driver Version: 11.4
CUDA Version: 11.6
CUDNN Version: 8.2
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 1.8
ONNX Runtime GPU: 1.11.0

