Myelin graph error when converting to TRT engine/inference

Description

I’m trying to convert a HuggingFace pegasus model to ONNX, then to TensorRT engine. I see the following warning during the trtexec conversion (for the decoder part):
“Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are:
(# 1 (SHAPE encoder_hidden_states))
(# 1 (SHAPE input_ids))”

Also this warning has been shown a lot when I try to run the ONNXRuntime with TensorRT Execution Provider. We currently don’t see a good performance of using TensorRT compared to using CUDA execution provider only using a model with similar architecture, so I wonder if this has anything to do with this. (During the inference, we see quite a lot of this warning)

The pegasus model decoder does have two inputs (input_ids, encoder_hidden_states) which need the batch & sequence_length to be dynamic. Wonder if you can help take a look at what should I do for this warning? Is there a way to use multiple dynamic values with TensorRT? Many thanks.

The instruction to reproduce this is below .

Environment

TensorRT Version: 8.2
GPU Type: V100
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if container which image + tag): Container nvcr.io/nvidia/tensorrt:21.12-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

  1. Exporting ONNX models from Huggingface model pegasus-xsum. I used the notebook here
  2. Inside a docker with image nvcr.io/nvidia/tensorrt:21.12-py3, run the trtexec command:
    trtexec --onnx=./onnx_models/decoder_lm_xsum_0129.onnx --saveEngine=./onnx_models/trt/decoder_xsum_0129.trt --minShapes=input_ids:1x1,encoder_hidden_states:1x1x1024 --optShapes=input_ids:1x256,encoder_hidden_states:1x256x1024 --maxShapes=input_ids:1x1024,encoder_hidden_states:1x1024x1024 --workspace=4000

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi, this model is quite large (the decoder part is 3.5GB), so it’s not easy for me to share using folders. It can be generated using the exporting notebook above.

onnx.checker.check_model produces no error.
The trtexec command has been shared above and runs successfully. My question is about the warning "Myelin graph with multiple dynamic values may have poor performance if they differ. " and would like to understand how this happens, since this may justify the high performance latency compared to only using cuda.

I don’t see any warning other than the Myelin graph one. Attached the verbose trtexec output.
xsum_verbose_trtexec_output.txt (14.6 KB)

Hi,

Myelin (one of TRT’s backend which is good at Transformers) does not support dynamic shapes, so TRT needs to build multiple Myelin graphs in order to support dynamic shapes and many paddings are needed because of this. It may still work, but perf is not always optimal.

Thank you.

1 Like

Thanks a lot for the reply! That explains the suboptimal inference we have seen. So for now my understanding is that TRT does not handle the multiple dynamic values well. So if we want to use TRT to accelerate such models, we need to find some way to get rid of the multiple dynamic inputs?

Yes.