Description
I’m trying to convert a HuggingFace pegasus model to ONNX, then to TensorRT engine. I see the following warning during the trtexec conversion (for the decoder part):
“Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are:
(# 1 (SHAPE encoder_hidden_states))
(# 1 (SHAPE input_ids))”
Also this warning has been shown a lot when I try to run the ONNXRuntime with TensorRT Execution Provider. We currently don’t see a good performance of using TensorRT compared to using CUDA execution provider only using a model with similar architecture, so I wonder if this has anything to do with this. (During the inference, we see quite a lot of this warning)
The pegasus model decoder does have two inputs (input_ids, encoder_hidden_states) which need the batch & sequence_length to be dynamic. Wonder if you can help take a look at what should I do for this warning? Is there a way to use multiple dynamic values with TensorRT? Many thanks.
The instruction to reproduce this is below .
Environment
TensorRT Version: 8.2
GPU Type: V100
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if container which image + tag): Container nvcr.io/nvidia/tensorrt:21.12-py3
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
- Exporting ONNX models from Huggingface model pegasus-xsum. I used the notebook here
- Inside a docker with image nvcr.io/nvidia/tensorrt:21.12-py3, run the trtexec command:
trtexec --onnx=./onnx_models/decoder_lm_xsum_0129.onnx --saveEngine=./onnx_models/trt/decoder_xsum_0129.trt --minShapes=input_ids:1x1,encoder_hidden_states:1x1x1024 --optShapes=input_ids:1x256,encoder_hidden_states:1x256x1024 --maxShapes=input_ids:1x1024,encoder_hidden_states:1x1024x1024 --workspace=4000