Segmentation fault when running inference of Bert example on jetson orin nano

krishnarajnair2015 · October 22, 2024, 3:32pm

Description

I am trying to execute the BertQA sample given in GitHub - NVIDIA/TensorRT at release/10.3.
I build the TensorRT engine as mentioned in the steps

[10/22/2024-16:27:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +14, now: CPU 2155, GPU 5079 (MiB)
[10/22/2024-16:27:20] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/22/2024-16:28:05] [TRT] [I] Detected 3 inputs and 1 output network tensors.
[10/22/2024-16:28:08] [TRT] [I] Total Host Persistent Memory: 330688
[10/22/2024-16:28:08] [TRT] [I] Total Device Persistent Memory: 0
[10/22/2024-16:28:08] [TRT] [I] Total Scratch Memory: 0
[10/22/2024-16:28:08] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 165 steps to complete.
[10/22/2024-16:28:08] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.33ms to assign 5 blocks to 165 nodes requiring 1378304 bytes.
[10/22/2024-16:28:08] [TRT] [I] Total Activation Memory: 1378304
[10/22/2024-16:28:08] [TRT] [I] Total Weights Memory: 170069008
[10/22/2024-16:28:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2367, GPU 5797 (MiB)
[10/22/2024-16:28:08] [TRT] [I] Engine generation completed in 50.3528 seconds.
[10/22/2024-16:28:09] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 384 MiB
[10/22/2024-16:28:09] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3159 MiB
[10/22/2024-16:28:09] [TRT] [I] build engine in 52.477 Sec
[10/22/2024-16:28:09] [TRT] [I] Saving Engine to engines/bert_large_128.engine
[10/22/2024-16:28:10] [TRT] [I] Done.

When running the inference it throws segmentation fault

python3 inference.py -e engines/bert_large_128.engine -p "TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps." -q "What is TensorRT?" -v models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/vocab.txt
[10/22/2024-16:28:40] [TRT] [I] Loaded engine size: 208 MiB
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +74, now: CPU 317, GPU 3733 (MiB)
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +8, GPU +68, now: CPU 110, GPU 3525 (MiB)
[10/22/2024-16:28:40] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 163 (MiB)

Passage: TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps.

Question: What is TensorRT?

Segmentation fault (core dumped)

Environment

TensorRT Version: 10.3
GPU Type: Jetson Orin Nano
Nvidia Driver Version:
CUDA Version: 12.6
CUDNN Version:
Operating System + Version: Jetpack 6.1 ubuntu 22.4
Python Version (if applicable): 3.10

AakankshaS · November 30, 2024, 10:29am

Hi @krishnarajnair2015 ,
Can you pls help with your onnx model.
This could be a jetson issue however.

Thanks

Topic		Replies	Views
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	899	March 13, 2023
ONNX model and TensorRT engine works differently TensorRT	5	706	February 20, 2023
TensorRT 8 segmentation fault when creating two contexts concurrently TensorRT	9	2737	March 5, 2024
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	742	March 13, 2023
Segmentation fault with Multi-Label Classification for Image Tagging tutorial Jetson AGX Orin jetson-inference	2	14	December 9, 2024
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	51	December 2, 2024
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4320	April 12, 2022
TensorRT8 INT8 (signed char) I/O interface for ONNX model TensorRT tensorrt , onnx	4	1350	February 15, 2022
Erorr with onnx to trt Jetson Xavier NX tensorrt	8	1231	March 30, 2022
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1447	October 20, 2022

Segmentation fault when running inference of Bert example on jetson orin nano

Description

Environment

Related topics