Real-Time Natural Language Understanding with BERT Using TensorRT

jwitsoe · August 13, 2019, 1:03pm

Originally published at: Real-Time Natural Language Understanding with BERT Using TensorRT | NVIDIA Technical Blog

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language understanding (NLU) tasks. Since its release in Oct 2018, BERT1 (Bidirectional Encoder Representations from Transformers) remains one of the most popular language models and still delivers state of the art accuracy at…

anon9409504 · August 15, 2019, 3:30pm

Great work! Does this also work on other GPUs like V100 and K80?
Also, what if I have a PyTorch model?

anon13745583 · August 20, 2019, 9:58am

I got an error when pulling the docker container:

root@ubuntu-gpu-7-200gb:/home/ubuntu/TensorRT/demo/BERT# sh python/create_docker_container.sh
Sending build context to Docker daemon 265.7kB
Step 1/17 : FROM nvcr.io/nvidia/tensorrt:19....
19.05-py3: Pulling from nvidia/tensorrt
7e6591854262: Pulling fs layer
089d60cb4e0a: Pull complete
9c461696bc09: Pull complete
45085432511a: Pull complete
6ca460804a89: Pull complete
2631f04ebf64: Pull complete
86f56e03e071: Pull complete
234646620160: Downloading [====================================> ] 447.9MB/615.2MB
7f717cd17058: Download complete
e69a2ba99832: Download complete
bc9bca17b13c: Download complete
1870788e477f: Download complete
603e0d586945: Downloading [=============================================> ] 452.2MB/492.7MB
717dfedf079c: Download complete
1035ef613bc7: Download complete
c5bd7559c3ad: Download complete
d82c679b8708: Download complete
059d4f560014: Download complete
f3f14cff44df: Download complete
96502bde320c: Download complete
bc5bb9379810: Download complete
e4d8bb046bc2: Download complete
4e2187010a7c: Download complete
9d62684b94c3: Download complete
e70e61e48991: Download complete
adecb91612fe: Download complete
ba27dafb70e8: Download complete
16bde716c9b2: Download complete
476faeed0740: Download complete
5af7c8a6b101: Download complete
960591fee98d: Download complete
0dd138c184ff: Download complete
7ef953567062: Downloading
bd9a54f5a193: Waiting
144852c40661: Waiting
171a26eec2d4: Waiting
999acb71c4df: Waiting
3f301e4ba386: Waiting
3fc30e0f9cba: Waiting
38d1459042f4: Waiting
aafa1a9d16eb: Waiting
unauthorized: authentication required
Unable to find image 'bert-tensorrt:latest' locally
docker: Error response from daemon: pull access denied for bert-tensorrt, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
root@ubuntu-gpu-7-200gb:/home/ubuntu/TensorRT/demo/BERT#

Any ideas ?

anon23378554 · August 29, 2019, 7:55am

good one so helpful

anon33020517 · September 30, 2019, 11:19pm

The instructions include:

python python/bert_builder.py -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert_base_384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2

however, the defaults appear to be configured to work with BERT large. The following change allows the steps to all complete without error:

+++ b/demo/BERT/python/build_examples.sh
@@ -16,9 +16,9 @@

# Setup default parameters (if no command-line parameters given)
-MODEL='large'
+MODEL='base'
FT_PRECISION='fp16'
-SEQ_LEN='128'
+SEQ_LEN='384'

SCRIPT=$(readlink -f "$0")
SCRIPT_DIR=$(dirname ${SCRIPT})

anon59412256 · October 9, 2019, 1:59am

Great work. I ran into the following problem, running the fourth step above:

FileNotFoundError: [Errno 2] No such file or directory: '/workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/bert_config.json'

anon59412256 · October 9, 2019, 2:06am

Ted. Hi ! It seemed you got past my issue if "/workspace/.." directory not being found. How did you get past that?

anon59412256 · October 9, 2019, 2:56am

OK, I solved my own problem. It works great now!

I had 2 issues. 1) The example script downloads a different model, so you might need to adjust it 2) It can take a while to create the "engine" file - at least it did for me :)

anon59412256 · October 9, 2019, 2:56am

I solved it by downloading the right file and fixing the example.

anon85287616 · January 9, 2020, 5:55am

get your API key from ngc.nvidia.com and then try `docker login nvcr.io`.

anon56153429 · April 4, 2020, 8:07am

Hi,
Really nice work done guys. In the explanation it is stated that the input and output of the fully connected layers is B x S x (N * H). However i have the PyTorch implementation of BERT from NVIDIA and it seems that the input and output of the Fully connected layers is just B x S x H. Below is a part of output of print(model).

(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=1024, out_features=1024, bias=True)
(key): Linear(in_features=1024, out_features=1024, bias=True)
(value): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(softmax): Softmax(dim=-1)
)

Also the BERT config file is .
1 {
2 attention_probs_dropout_prob: 0.1,
3 hidden_act: gelu,
4 hidden_dropout_prob: 0.1,
5 hidden_size: 1024,
6 initializer_range: 0.02,
7 intermediate_size: 4096,
8 max_position_embeddings: 512,
9 num_attention_heads: 16,
10 num_hidden_layers: 24,
11 type_vocab_size: 2,
12 vocab_size: 30522
13 }

jack.xy.zhang · December 29, 2020, 6:15am

When run the “cd TensorRT/demo/BERT && sh python/build_examples.sh” got the problem: "Error: ‘nvidia/bert_tf_v2_base_fp16_384:2’ could not be found. "

And on the ngc I didn’t find the model with this name. AI Models - Computer Vision, Conversational AI, and More | NVIDIA NGC

Any one can help? Where can I download the fine tuned weight for t his now? Thanks in advance.

Topic		Replies	Views
model scripts : BERT for TensorFlow Docker and NVIDIA Docker	0	914	December 9, 2019
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1700	January 25, 2024
Jump-start AI Training with NGC Pretrained Models On-Premises and in the Cloud Technical Blog	0	378	August 25, 2020
API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::1480, condition: allInputDimensionsSpecified(routine) TensorRT tensorrt , cuda , natural-language-processing-nlp	6	11450	February 1, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	3585	August 28, 2024
Trt_pose model in docker: ImportError: libnvmedia_tensor.so: cannot open shared object file: No such file or directory Jetson Nano tensorrt , dla	7	964	May 3, 2023
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	926	December 12, 2023
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	4368	October 12, 2021
Problem with NVIDIA-AI-IOT/deepstream_lpr_app TAO Toolkit	11	818	October 12, 2021
Inferring resnet18 classification etlt model with python TAO Toolkit	45	4000	October 12, 2021

Real-Time Natural Language Understanding with BERT Using TensorRT

Related topics