Myelin graph error when converting to TRT engine/inference

brevity2021 · January 30, 2022, 6:54am

Description

I’m trying to convert a HuggingFace pegasus model to ONNX, then to TensorRT engine. I see the following warning during the trtexec conversion (for the decoder part):
“Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are:
(# 1 (SHAPE encoder_hidden_states))
(# 1 (SHAPE input_ids))”

Also this warning has been shown a lot when I try to run the ONNXRuntime with TensorRT Execution Provider. We currently don’t see a good performance of using TensorRT compared to using CUDA execution provider only using a model with similar architecture, so I wonder if this has anything to do with this. (During the inference, we see quite a lot of this warning)

The pegasus model decoder does have two inputs (input_ids, encoder_hidden_states) which need the batch & sequence_length to be dynamic. Wonder if you can help take a look at what should I do for this warning? Is there a way to use multiple dynamic values with TensorRT? Many thanks.

The instruction to reproduce this is below .

Environment

TensorRT Version: 8.2
GPU Type: V100
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if container which image + tag): Container nvcr.io/nvidia/tensorrt:21.12-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Exporting ONNX models from Huggingface model pegasus-xsum. I used the notebook here
Inside a docker with image nvcr.io/nvidia/tensorrt:21.12-py3, run the trtexec command:
trtexec --onnx=./onnx_models/decoder_lm_xsum_0129.onnx --saveEngine=./onnx_models/trt/decoder_xsum_0129.trt --minShapes=input_ids:1x1,encoder_hidden_states:1x1x1024 --optShapes=input_ids:1x256,encoder_hidden_states:1x256x1024 --maxShapes=input_ids:1x1024,encoder_hidden_states:1x1024x1024 --workspace=4000

NVES · January 30, 2022, 7:08am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

brevity2021 · January 30, 2022, 8:04pm

Hi, this model is quite large (the decoder part is 3.5GB), so it’s not easy for me to share using folders. It can be generated using the exporting notebook above.

onnx.checker.check_model produces no error.
The trtexec command has been shared above and runs successfully. My question is about the warning "Myelin graph with multiple dynamic values may have poor performance if they differ. " and would like to understand how this happens, since this may justify the high performance latency compared to only using cuda.

I don’t see any warning other than the Myelin graph one. Attached the verbose trtexec output.
xsum_verbose_trtexec_output.txt (14.6 KB)

spolisetty · February 2, 2022, 6:17am

Hi,

Myelin (one of TRT’s backend which is good at Transformers) does not support dynamic shapes, so TRT needs to build multiple Myelin graphs in order to support dynamic shapes and many paddings are needed because of this. It may still work, but perf is not always optimal.

Thank you.

brevity2021 · February 14, 2022, 5:25am

Thanks a lot for the reply! That explains the suboptimal inference we have seen. So for now my understanding is that TRT does not handle the multiple dynamic values well. So if we want to use TRT to accelerate such models, we need to find some way to get rid of the multiple dynamic inputs?

spolisetty · February 25, 2022, 9:25am

Yes.

Topic		Replies	Views
TRT8 engine creation from onnx fails due to AssertionError TensorRT tensorrt , onnx	8	2163	June 7, 2022
Eager execution of the conditional graphs TensorRT	13	1727	January 23, 2023
Dynamic shape onnx model TensorRT tensorrt	3	1580	August 18, 2020
Bad inference performance of Transformer decoder with incremental decoding TensorRT	2	1996	February 7, 2022
Error Code 1: Myelin (No results returned from cublas heuristic search) #2115 TensorRT	6	1616	July 6, 2022
Struggling to get model onto tensorrt TensorRT tensorrt , cuda , pytorch	2	649	March 30, 2024
Same version TensorRT with two methods to convert onnx model，One used trtexec[FAILED] , the other used python[Success] TensorRT	5	745	October 3, 2023
Myelin problem converting TensorflowV2.5.0 Object_detection_api to TensorRT on jetson nano Jetson Nano tensorrt , tensorflow , nvbugs , onnx	7	1529	October 15, 2021
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1394	November 17, 2022
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2359	November 23, 2022