I inference model by provider TensorrtExecutionProvider of ONNXRuntime.
First it convert model ONNX to Engine
Then it execute inference model
I have 3 questions:
Question 1: The inference time for first time is too slow. How can I fix it ???. As I know maybe I can save .profile and load it before inference.
Question 2: How can I save and load file .engine instead of reconverting to engine after each inference.
Question 3: I can not set trt_profile_max_shapes (it said not support) for inference. Because when I give a large input dataloader it raise error mylin and limit of shape.
Besides, Any support for ONNX IOBinding with TensorrtExecutionProvider ???
Thanks a lot for your help <3
TensorRT Version: 8.2.3 GPU Type: A10 Nvidia Driver Version: 11.4 CUDA Version: 11.6 CUDNN Version: 8.2 Operating System + Version: Ubuntu 20.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): 1.8 ONNX Runtime GPU: 1.11.0