Real-Time Natural Language Understanding with BERT Using TensorRT

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language understanding (NLU) tasks. Since its release in Oct 2018, BERT1 (Bidirectional Encoder Representations from Transformers) remains one of the most popular language models and still delivers state of the art accuracy at…

Great work! Does this also work on other GPUs like V100 and K80?
Also, what if I have a PyTorch model?

I got an error when pulling the docker container:

Any ideas ?

The instructions include:

python python/ -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert_base_384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2

however, the defaults appear to be configured to work with BERT large. The following change allows the steps to all complete without error:

+++ b/demo/BERT/python/
@@ -16,9 +16,9 @@

# Setup default parameters (if no command-line parameters given)

SCRIPT=$(readlink -f "$0")
SCRIPT_DIR=$(dirname ${SCRIPT})

Great work. I ran into the following problem, running the fourth step above:

FileNotFoundError: [Errno 2] No such file or directory: '/workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/bert_config.json'

Ted. Hi ! It seemed you got past my issue if "/workspace/.." directory not being found. How did you get past that?

OK, I solved my own problem. It works great now!

I had 2 issues. 1) The example script downloads a different model, so you might need to adjust it 2) It can take a while to create the "engine" file - at least it did for me :)

I solved it by downloading the right file and fixing the example.

get your API key from and then try `docker login`.

Really nice work done guys. In the explanation it is stated that the input and output of the fully connected layers is B x S x (N * H). However i have the PyTorch implementation of BERT from NVIDIA and it seems that the input and output of the Fully connected layers is just B x S x H. Below is a part of output of print(model).

(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=1024, out_features=1024, bias=True)
(key): Linear(in_features=1024, out_features=1024, bias=True)
(value): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(softmax): Softmax(dim=-1)

Also the BERT config file is .
1 {
2 attention_probs_dropout_prob: 0.1,
3 hidden_act: gelu,
4 hidden_dropout_prob: 0.1,
5 hidden_size: 1024,
6 initializer_range: 0.02,
7 intermediate_size: 4096,
8 max_position_embeddings: 512,
9 num_attention_heads: 16,
10 num_hidden_layers: 24,
11 type_vocab_size: 2,
12 vocab_size: 30522
13 }

When run the “cd TensorRT/demo/BERT && sh python/” got the problem: "Error: ‘nvidia/bert_tf_v2_base_fp16_384:2’ could not be found. "

And on the ngc I didn’t find the model with this name.

Any one can help? Where can I download the fine tuned weight for t his now? Thanks in advance.