I inference model by provider TensorrtExecutionProvider of ONNXRuntime.
First it convert model ONNX to Engine
Then it execute inference model
I have 3 questions:
Question 1: The inference time for first time is too slow. How can I fix it ???. As I know maybe I can save .profile and load it before inference.
Question 2: How can I save and load file .engine instead of reconverting to engine after each inference.
Question 3: I can not set trt_profile_max_shapes (it said not support) for inference. Because when I give a large input dataloader it raise error mylin and limit of shape.
Besides, Any support for ONNX IOBinding with TensorrtExecutionProvider ???
Thanks a lot for your help <3
Environment
TensorRT Version: 8.2.3 GPU Type: A10 Nvidia Driver Version: 11.4 CUDA Version: 11.6 CUDNN Version: 8.2 Operating System + Version: Ubuntu 20.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): 1.8 ONNX Runtime GPU: 1.11.0
Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation
Also, request you to share your model and script if not shared already so that we can help you better.
Meanwhile, for some common errors and queries please refer to below link: