Cuda failure: 209 Aborted (core dumped)

darshancganji12 · July 14, 2020, 7:21am

I am following this material(Real-Time Natural Language Understanding with BERT Using TensorRT | NVIDIA Technical Blog to convert to trt engine and run it on our GPU System. But when during Building the Engine(python python/bert_builder.py -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert_base_384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2), I am getting the error as “Cuda failure: 209 Aborted (core dumped)” nvidia . I am running it on 2 P100 GPUs. CUDA version: 10.2. Driver version: 440

Please help to resolve this.

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

AakankshaS · July 14, 2020, 8:36am

Hi @darshancganji12,
The BERT sample is set up to build on SM 70 and SM 75 by default.
Try adjusting the CMAKE_CUDA_FLAGS in CMakeLists.txt to target your device, in your case(P100) its 60.
Also it is recommended to use latest TRT release(7.1)

Thanks!

darshancganji12 · July 14, 2020, 9:31am

Dear @AakankshaS,
Thank you for your response.

May I know what to change in this for P100?
set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS}
–expt-relaxed-constexpr
–expt-extended-lambda
-gencode arch=compute_70,code=sm_70
-gencode arch=compute_75,code=sm_75
-03”)

Thank you

AakankshaS · July 14, 2020, 9:44am

Hi @darshancganji12

It should be something like

set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS}
–expt-relaxed-constexpr
–expt-extended-lambda
-gencode arch=compute_60,code=sm_60
-O3”)

Thanks!

darshancganji12 · July 14, 2020, 10:07am

Dear @AakankshaS,
Thank you so much for your response.
I am still facing the same issue after modifying it. Please give any other suggestions. would be great help.
Also, the pre requisits mentioned in this(TensorRT/demo/BERT at release/5.1 · NVIDIA/TensorRT · GitHub) are all satisfied.
Thank you

AakankshaS · July 28, 2020, 1:29pm

Hi,
Apologies for delayed response.
Are you still facing the issue?