Description
I’m trying to convert a HuggingFace Transformers model to a TRT engine but this fails with the error:
[07/21/2021-20:51:52] [V] [TRT] --------------- Timing Runner: {ForeignNode[945 + (Unnamed Layer* 74) [Shuffle]…Add_564]} (Myelin)
trtexec: /dvs/p4/build/sw/rel/gpgpu/MachineLearning/myelin_trt8/src/compiler/…/./compiler/kernel_gen/kernel_gen_utils.hpp:190: myelin::kgen::dag_vertex_t::operand_t myelin::kgen::{anonymous}::red_partial_operand(const myelin::kgen::dag_vertex_t*): Assertion `rvtx->is_lowered_op()’ failed.
Transformers version: 4.8.2
I’m using the TRT OSS build container for converting the onnx model to TRT. I built the 8.0.1
tag of this repo using the instructions provided in the README.md
file.
NOTE: TRT engine creation for the same model using TRT 7.2.3.4 succeeds.
Environment
TensorRT Version: 8.0.1.6
GPU Type: RTX 3080 Laptop
Nvidia Driver Version: 465.31
CUDA Version: 11.3
CUDNN Version: 8.2.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9.0+cu111
Baremetal or Container (if container which image + tag):
Relevant Files
Model: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco · Hugging Face
Steps To Reproduce
One line in one transformers file needs to be modified because TRT doesn’t support bool
data type in the expand operator.
In <PYTHON_LIBS>/site-packages/transformers/models/distilbert/modeling_distilbert.py
change line 183 from:
mask = (mask == 0).view(mask_reshp).expand_as(scores) # (bs, n_heads, q_length, k_length)
to:
mask = (mask == 0).int().view(mask_reshp).expand_as(scores).bool() # (bs, n_heads, q_length, k_length)
Then run the code to download model and convert to onnx:
import torch.onnx
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(“sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco”).cuda().eval()
tokenzier = AutoTokenizer.from_pretrained(“sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco”)
sample = {k: v.cuda() for k, v in tokenzier([“Example1”, “Slightly longer example2!”] * 16, return_tensors=‘pt’, truncation=True, padding=‘max_length’, max_length=512).items()}
torch.onnx.export(model, (sample[‘input_ids’], sample[‘attention_mask’]), “distilbert_dot_tas_b.onnx”, verbose=True, opset_version=13)
Run trtexec
as follows:
./trtexec --onnx=/workspace/models/distilbert_dot_tas_b.onnx --saveEngine=engine.trt --explicitBatch --verbose --workspace=4096