Running inference using python on tensorrt engine generated from Huggingface 'Bert-base-cased' for token classification task

Hi, I am trying to solve token classification problem using BERT (‘Bert-base-cased’) from Huggingface transformers. I am able to convert ‘bert’ model to ‘onnx’ format and then to ‘tensorrt engine’.
Can somebody share the sample python code to run inference using tensorrt engine.

When I am running predictions on Bert (without tensorrt), I am passing inputs as dictionary to ‘predict’ method… dict_keys([‘labels’, ‘input_ids’, ‘token_type_ids’, ‘attention_mask’])

I am confused how to pass these inputs to tensorrt engine, as it is not accepting it…
What should be input and output shapes I have to give to run predictions?

I am using the code given in below GitHub link:

TensorRT Version:
**GPU Type: Tesla V100-SXM2
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2.2
CUDNN Version:
Operating System + Version: ubuntu-20.04.1
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 2.7
PyTorch Version (if applicable): n/a
Baremetal or Container (if container which image + tag): container

We recommend you to check the below samples links in case of tf-trt integration issues.

If issue persist, We recommend you to reach out to Tensorflow forum.

HI, I am not using TF-TRT.
I am following the below path:

Huggingface transformer(tensorfow) => ONNX Model => TensorRT engine.
Now run inference using Python API’s on TensorRT engine…


Please refer following samples on BERT inference, which may help you.

Thank you.

Thank you, I will try them…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.