Cuda failure: 209 Aborted (core dumped)

I am following this material(Real-Time Natural Language Understanding with BERT Using TensorRT | NVIDIA Technical Blog to convert to trt engine and run it on our GPU System. But when during Building the Engine(python python/bert_builder.py -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert_base_384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2), I am getting the error as “Cuda failure: 209 Aborted (core dumped)” nvidia. I am running it on 2 P100 GPUs. CUDA version: 10.2. Driver version: 440

Please help to resolve this.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE… Off | 00000000:02:00.0 Off | 0 |
| N/A 39C P0 33W / 250W | 0MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla P100-PCIE… Off | 00000000:81:00.0 Off | 0 |
| N/A 38C P0 29W / 250W | 0MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Hi @darshancganji12,
The BERT sample is set up to build on SM 70 and SM 75 by default.
Try adjusting the CMAKE_CUDA_FLAGS in CMakeLists.txt to target your device, in your case(P100) its 60.
Also it is recommended to use latest TRT release(7.1)

Thanks!

Dear @AakankshaS,
Thank you for your response.

May I know what to change in this for P100?
set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS}
–expt-relaxed-constexpr
–expt-extended-lambda
-gencode arch=compute_70,code=sm_70
-gencode arch=compute_75,code=sm_75
-03”)

Thank you

Hi @darshancganji12

It should be something like

set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS}
–expt-relaxed-constexpr
–expt-extended-lambda
-gencode arch=compute_60,code=sm_60
-O3”)

Thanks!

Dear @AakankshaS,
Thank you so much for your response.
I am still facing the same issue after modifying it. Please give any other suggestions. would be great help.
Also, the pre requisits mentioned in this(TensorRT/demo/BERT at release/5.1 · NVIDIA/TensorRT · GitHub) are all satisfied.
Thank you

Hi,
Apologies for delayed response.
Are you still facing the issue?