TRT8 engine creation from onnx fails due to AssertionError


I’m trying to convert a HuggingFace Transformers model to a TRT engine but this fails with the error:

[07/21/2021-20:51:52] [V] [TRT] --------------- Timing Runner: {ForeignNode[945 + (Unnamed Layer* 74) [Shuffle]…Add_564]} (Myelin)
trtexec: /dvs/p4/build/sw/rel/gpgpu/MachineLearning/myelin_trt8/src/compiler/…/./compiler/kernel_gen/kernel_gen_utils.hpp:190: myelin::kgen::dag_vertex_t::operand_t myelin::kgen::{anonymous}::red_partial_operand(const myelin::kgen::dag_vertex_t*): Assertion `rvtx->is_lowered_op()’ failed.

Transformers version: 4.8.2

I’m using the TRT OSS build container for converting the onnx model to TRT. I built the 8.0.1 tag of this repo using the instructions provided in the file.

NOTE: TRT engine creation for the same model using TRT succeeds.


TensorRT Version:
GPU Type: RTX 3080 Laptop
Nvidia Driver Version: 465.31
CUDA Version: 11.3
CUDNN Version: 8.2.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9.0+cu111
Baremetal or Container (if container which image + tag):

Relevant Files

Model: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco · Hugging Face

Steps To Reproduce

One line in one transformers file needs to be modified because TRT doesn’t support bool data type in the expand operator.
In <PYTHON_LIBS>/site-packages/transformers/models/distilbert/ change line 183 from:

mask = (mask == 0).view(mask_reshp).expand_as(scores) # (bs, n_heads, q_length, k_length)


mask = (mask == 0).int().view(mask_reshp).expand_as(scores).bool() # (bs, n_heads, q_length, k_length)

Then run the code to download model and convert to onnx:

import torch.onnx
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(“sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco”).cuda().eval()
tokenzier = AutoTokenizer.from_pretrained(“sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco”)
sample = {k: v.cuda() for k, v in tokenzier([“Example1”, “Slightly longer example2!”] * 16, return_tensors=‘pt’, truncation=True, padding=‘max_length’, max_length=512).items()}
torch.onnx.export(model, (sample[‘input_ids’], sample[‘attention_mask’]), “distilbert_dot_tas_b.onnx”, verbose=True, opset_version=13)

Run trtexec as follows:

./trtexec --onnx=/workspace/models/distilbert_dot_tas_b.onnx --saveEngine=engine.trt --explicitBatch --verbose --workspace=4096

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

Thanks for getting back!
Here’s the model:
The script is mentioned in the original post:

Then run the code to download model and convert to onnx:

In the meantime, I’ll give the onnx checker a try and revert with what I find.

Hi @lttazz99,

We are unable to reproduce this issue on Tesla V100 GPU. We will try to run on RTX GPU.
Meanwhile could you please share us trtexec --verbose logs for better debugging.

Thank you.

Thanks for getting back @spolisetty. I apologize, I provided the wrong model. Here’s the correct link:

I’ll attach the verbose logs soon.

@spolisetty PFA the verbose logs. I also ran onnx checker as suggested in the first reply and it didn’t throw any errors.
TRT8_verbose_logs.txt (449.1 KB)

Hi @lttazz99,

Thank you for sharing the model, we could reproduce the error. Please allow us some time to work on this.


This is known issue, which will be resolved in future releases.

Thank you.

do you mind sharing more details about this known issue, please?
it’d be helpful for other audiences who are debugging similar issues.

1 Like