DeepLearningExamples running bert-squad on GPU P100

I want to use the tensorrt squad model provided by nvidia on a P100 GPU. Specifically this is my GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

I started by using the 7.0 version of tensorrt for optimizing, specifically I am following the instructions given here:


Which should be using the 7.0 compute capability.

I am running the following commands as per instructions:

bash scripts/download_model.sh base fp16 384
mkdir -p /workspace/bert/engines7
python3 builder.py -m /workspace/bert/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o /workspace/bert/engines7/bert_base_384.engine -b 1 -s 384 --fp16 -c /workspace/bert/models/fine-tuned/bert_tf_v2_base_fp16_384_v2

The above works fine and the engine is created.

Now if I run the inference command I get this:

inference.py -e /workspace/bert/engines7/bert_base_384.engine -p "TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps." -q "What is TensorRT?" -v /workspace/bert/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/vocab.txt
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::949, condition: profileMinDims.d[i] <= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::949, condition: profileMinDims.d[i] <= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::949, condition: profileMinDims.d[i] <= dimensions.d[i]

Passage: TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps.

Question: What is TensorRT?
[06/23/2020-21:41:48] [W] TensorCore support was not selected
[06/23/2020-21:41:48] [W] TensorCore support was not selected
[06/23/2020-21:41:48] [W] TensorCore support was not selected
[06/23/2020-21:41:52] [W] TensorCore support was not selected
[06/23/2020-21:41:52] [W] TensorCore support was not selected
[06/23/2020-21:41:52] [W] TensorCore support was not selected
[06/23/2020-21:41:56] [W] TensorCore support was not selected
[06/23/2020-21:41:56] [W] TensorCore support was not selected
[06/23/2020-21:41:56] [W] TensorCore support was not selected
[06/23/2020-21:42:00] [W] TensorCore support was not selected
[06/23/2020-21:42:00] [W] TensorCore support was not selected
[06/23/2020-21:42:00] [W] TensorCore support was not selected
[06/23/2020-21:42:04] [W] TensorCore support was not selected
[06/23/2020-21:42:04] [W] TensorCore support was not selected
[06/23/2020-21:42:04] [W] TensorCore support was not selected
[06/23/2020-21:42:08] [W] TensorCore support was not selected
[06/23/2020-21:42:08] [W] TensorCore support was not selected
[06/23/2020-21:42:08] [W] TensorCore support was not selected
[06/23/2020-21:42:12] [W] TensorCore support was not selected
[06/23/2020-21:42:12] [W] TensorCore support was not selected
[06/23/2020-21:42:12] [W] TensorCore support was not selected
[06/23/2020-21:42:16] [W] TensorCore support was not selected
[06/23/2020-21:42:16] [W] TensorCore support was not selected
[06/23/2020-21:42:16] [W] TensorCore support was not selected
[06/23/2020-21:42:20] [W] TensorCore support was not selected
[06/23/2020-21:42:20] [W] TensorCore support was not selected
[06/23/2020-21:42:20] [W] TensorCore support was not selected
[06/23/2020-21:42:24] [W] TensorCore support was not selected
[06/23/2020-21:42:24] [W] TensorCore support was not selected
[06/23/2020-21:42:24] [W] TensorCore support was not selected
[06/23/2020-21:42:28] [W] TensorCore support was not selected
[06/23/2020-21:42:28] [W] TensorCore support was not selected
[06/23/2020-21:42:28] [W] TensorCore support was not selected
[06/23/2020-21:42:32] [W] TensorCore support was not selected
[06/23/2020-21:42:32] [W] TensorCore support was not selected
[06/23/2020-21:42:32] [W] TensorCore support was not selected
Cuda failure: 209
Aborted (core dumped)

The above works smoothly on a machine with a V100 GPU, I am running exactly the same commands.

Could this be related to the fact that p100 are only compatible with 6.0 compute capability as per this link:

I am also exactly the same issue with P100 please help to resolve.

UPDATE. I ended up usint T4 GPUs, which are supported on v7.0. P100 didn’t work for me. P100 worked smoothly on 6.X though. So you can either use an earlier version of tensorrt or use a different GPU.

Hi @francesco.ciannella,
I am following https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt this exact material to run on my P100 GPUs. Here I am building an engine and running inference through the Container. Building an Engine is working fine, but running inference is not working( 4th step in this https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt#quick-start-guide). The TRT version available in that container is 7.0.0.1.
You suggested using an earlier version of TRT. So do I downgrade the TRT in the Container? and which version worked for you on P100?

Thanks for your help

Yes run it on 6.0. In that case to generate the engine use this document:

You can checkout release 6.0 instead of 5.1. After 6.0 they moved the example to the DeepLearningExamples repository.

The above will work on a P100. Beware that there are some hiccups when you download the model, but I guess you already figured that out.

@francesco.ciannella,
I also tried with the above reference material, but it has TRT 5.1 which throws the error while building an engine. But how to download the TRT 6.0 Container?
nvcr.io/nvidia/tensorrt:19.05-py3: Container with TRT v5.1 with CUDA 10.1.
nvcr.io/nvidia/tensorrt:20.05-py3: Container with TRT v7.0.0.1 with CUDA 10.2
But wht about TRTv6 conatiner?

For the reference of those, who will search for

[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]
[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::1045, condition: profileMaxDims.d[i] >= dimensions.d[i]

What this error means is that the shapes you are specifying via context.set_binding_shape is out of the range, which was specified for the engine while it was build. The original range can be retrieved via

inp_idx = engine.get_binding_index("input_name")
min_shape, opt_shape, max_shape = engine.get_profile_shape(context.active_optimization_profile,inp_idx)