Segmentation fault when running inference of Bert example on jetson orin nano

Description

I am trying to execute the BertQA sample given in GitHub - NVIDIA/TensorRT at release/10.3.
I build the TensorRT engine as mentioned in the steps

[10/22/2024-16:27:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +14, now: CPU 2155, GPU 5079 (MiB)
[10/22/2024-16:27:20] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/22/2024-16:28:05] [TRT] [I] Detected 3 inputs and 1 output network tensors.
[10/22/2024-16:28:08] [TRT] [I] Total Host Persistent Memory: 330688
[10/22/2024-16:28:08] [TRT] [I] Total Device Persistent Memory: 0
[10/22/2024-16:28:08] [TRT] [I] Total Scratch Memory: 0
[10/22/2024-16:28:08] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 165 steps to complete.
[10/22/2024-16:28:08] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.33ms to assign 5 blocks to 165 nodes requiring 1378304 bytes.
[10/22/2024-16:28:08] [TRT] [I] Total Activation Memory: 1378304
[10/22/2024-16:28:08] [TRT] [I] Total Weights Memory: 170069008
[10/22/2024-16:28:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2367, GPU 5797 (MiB)
[10/22/2024-16:28:08] [TRT] [I] Engine generation completed in 50.3528 seconds.
[10/22/2024-16:28:09] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 384 MiB
[10/22/2024-16:28:09] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3159 MiB
[10/22/2024-16:28:09] [TRT] [I] build engine in 52.477 Sec
[10/22/2024-16:28:09] [TRT] [I] Saving Engine to engines/bert_large_128.engine
[10/22/2024-16:28:10] [TRT] [I] Done.

When running the inference it throws segmentation fault

python3 inference.py -e engines/bert_large_128.engine -p "TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps." -q "What is TensorRT?" -v models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/vocab.txt
[10/22/2024-16:28:40] [TRT] [I] Loaded engine size: 208 MiB
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +74, now: CPU 317, GPU 3733 (MiB)
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +8, GPU +68, now: CPU 110, GPU 3525 (MiB)
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 163 (MiB)

Passage: TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps.

Question: What is TensorRT?

Segmentation fault (core dumped)

Environment

TensorRT Version: 10.3
GPU Type: Jetson Orin Nano
Nvidia Driver Version:
CUDA Version: 12.6
CUDNN Version:
Operating System + Version: Jetpack 6.1 ubuntu 22.4
Python Version (if applicable): 3.10

Hi @krishnarajnair2015 ,
Can you pls help with your onnx model.
This could be a jetson issue however.

Thanks