Segment Fault due to strlen-avx2.S missing

lannyyip1 · September 14, 2021, 4:45am

Description

When I try to convert ONNX modal to TensorRT Engine with TensorRT 8.0.3.4, a segment fault would be reported.

Environment

TensorRT Version: 8.0.3.4
GPU Type: RTX 3070
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.1 but complained that 8.1.1 is used
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):

Relevant Files

(ONNX Model)[share/model.onnx at master · lannyyip/share · GitHub]: share/model.onnx at master · lannyyip/share · GitHub

Steps To Reproduce

build trtexec from sample directory, to get trtexec_debug
load trtexec_debug with gdb, with following command:

gdb --args trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin

run and get the result

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from trtexec_debug...
(gdb) run
Starting program: /home/lanny/tools/TensorRT-8.0.3.4/targets/x86_64-linux-gnu/bin/trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
&&&& RUNNING TensorRT.trtexec [TensorRT v8003] # /home/lanny/tools/TensorRT-8.0.3.4/targets/x86_64-linux-gnu/bin/trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin
[09/14/2021-12:39:41] [I] === Model Options ===
[09/14/2021-12:39:41] [I] Format: ONNX
[09/14/2021-12:39:41] [I] Model: model.onnx
[09/14/2021-12:39:41] [I] Output:
[09/14/2021-12:39:41] [I] === Build Options ===
[09/14/2021-12:39:41] [I] Max batch: explicit
[09/14/2021-12:39:41] [I] Workspace: 5000 MiB
[09/14/2021-12:39:41] [I] minTiming: 1
[09/14/2021-12:39:41] [I] avgTiming: 8
[09/14/2021-12:39:41] [I] Precision: FP32+FP16
[09/14/2021-12:39:41] [I] Calibration: 
[09/14/2021-12:39:41] [I] Refit: Disabled
[09/14/2021-12:39:41] [I] Sparsity: Disabled
[09/14/2021-12:39:41] [I] Safe mode: Disabled
[09/14/2021-12:39:41] [I] Restricted mode: Disabled
[09/14/2021-12:39:41] [I] Save engine: model.bin
[09/14/2021-12:39:41] [I] Load engine: 
[09/14/2021-12:39:41] [I] NVTX verbosity: 0
[09/14/2021-12:39:41] [I] Tactic sources: Using default tactic sources
[09/14/2021-12:39:41] [I] timingCacheMode: local
[09/14/2021-12:39:41] [I] timingCacheFile: 
[09/14/2021-12:39:41] [I] Input(s)s format: fp32:CHW
[09/14/2021-12:39:41] [I] Output(s)s format: fp32:CHW
[09/14/2021-12:39:41] [I] Input build shape: input0=1x1x1024x500+1x1x1024x500+1x1x1024x500
[09/14/2021-12:39:41] [I] Input calibration shapes: model
[09/14/2021-12:39:41] [I] === System Options ===
[09/14/2021-12:39:41] [I] Device: 0
[09/14/2021-12:39:41] [I] DLACore: 
[09/14/2021-12:39:41] [I] Plugins:
[09/14/2021-12:39:41] [I] === Inference Options ===
[09/14/2021-12:39:41] [I] Batch: Explicit
[09/14/2021-12:39:41] [I] Input inference shape: input0=1x1x1024x500
[09/14/2021-12:39:41] [I] Iterations: 10
[09/14/2021-12:39:41] [I] Duration: 3s (+ 200ms warm up)
[09/14/2021-12:39:41] [I] Sleep time: 0ms
[09/14/2021-12:39:41] [I] Streams: 1
[09/14/2021-12:39:41] [I] ExposeDMA: Disabled
[09/14/2021-12:39:41] [I] Data transfers: Enabled
[09/14/2021-12:39:41] [I] Spin-wait: Disabled
[09/14/2021-12:39:41] [I] Multithreading: Disabled
[09/14/2021-12:39:41] [I] CUDA Graph: Disabled
[09/14/2021-12:39:41] [I] Separate profiling: Disabled
[09/14/2021-12:39:41] [I] Time Deserialize: Disabled
[09/14/2021-12:39:41] [I] Time Refit: Disabled
[09/14/2021-12:39:41] [I] Skip inference: Disabled
[09/14/2021-12:39:41] [I] Inputs:
[09/14/2021-12:39:41] [I] === Reporting Options ===
[09/14/2021-12:39:41] [I] Verbose: Disabled
[09/14/2021-12:39:41] [I] Averages: 10 inferences
[09/14/2021-12:39:41] [I] Percentile: 99
[09/14/2021-12:39:41] [I] Dump refittable layers:Disabled
[09/14/2021-12:39:41] [I] Dump output: Disabled
[09/14/2021-12:39:41] [I] Profile: Disabled
[09/14/2021-12:39:41] [I] Export timing to JSON file: 
[09/14/2021-12:39:41] [I] Export output to JSON file: 
[09/14/2021-12:39:41] [I] Export profile to JSON file: 
[09/14/2021-12:39:41] [I] 
[New Thread 0x7fffbf7e7000 (LWP 3333308)]
[09/14/2021-12:39:41] [I] === Device Information ===
[09/14/2021-12:39:41] [I] Selected Device: NVIDIA GeForce RTX 3070
[09/14/2021-12:39:41] [I] Compute Capability: 8.6
[09/14/2021-12:39:41] [I] SMs: 46
[09/14/2021-12:39:41] [I] Compute Clock Rate: 1.755 GHz
[09/14/2021-12:39:41] [I] Device Global Memory: 7973 MiB
[09/14/2021-12:39:41] [I] Shared Memory per SM: 100 KiB
[09/14/2021-12:39:41] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/14/2021-12:39:41] [I] Memory Clock Rate: 7.001 GHz
[09/14/2021-12:39:41] [I] 
[09/14/2021-12:39:41] [I] TensorRT version: 8003
[New Thread 0x7fffbefe6000 (LWP 3333309)]
[New Thread 0x7fffbe61d000 (LWP 3333310)]
[09/14/2021-12:39:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +532, GPU +0, now: CPU 539, GPU 408 (MiB)
[09/14/2021-12:39:41] [I] Start parsing network model
[09/14/2021-12:39:41] [I] [TRT] ----------------------------------------------------------------
[09/14/2021-12:39:41] [I] [TRT] Input filename:   model.onnx
[09/14/2021-12:39:41] [I] [TRT] ONNX IR version:  0.0.6
[09/14/2021-12:39:41] [I] [TRT] Opset version:    11
[09/14/2021-12:39:41] [I] [TRT] Producer name:    pytorch
[09/14/2021-12:39:41] [I] [TRT] Producer version: 1.7
[09/14/2021-12:39:41] [I] [TRT] Domain:           
[09/14/2021-12:39:41] [I] [TRT] Model version:    0
[09/14/2021-12:39:41] [I] [TRT] Doc string:       
[09/14/2021-12:39:41] [I] [TRT] ----------------------------------------------------------------
[09/14/2021-12:39:41] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/14/2021-12:39:42] [I] Finish parsing network model
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1223, GPU 831 (MiB)
[09/14/2021-12:39:42] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 1223 MiB, GPU 831 MiB
[09/14/2021-12:39:42] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +352, GPU +270, now: CPU 1578, GPU 1255 (MiB)
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1578, GPU 1265 (MiB)
[09/14/2021-12:39:42] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[09/14/2021-12:39:42] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[New Thread 0x7fffbdb11000 (LWP 3333311)]

Thread 1 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65      ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007fffdcbf65b4 in __GI__IO_puts (str=0x0) at ioputs.c:35
#2  0x00007fffe13ccc40 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#3  0x00007fffe1cf7eac in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#4  0x00007fffe05545d9 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#5  0x00007fffe055e3ce in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#6  0x00007fffe061ad8b in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#7  0x00007fffe05b1b21 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#8  0x00007fffe03e444c in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#9  0x00007fffe0600309 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#10 0x00007fffe044acec in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#11 0x00007fffe044c8b6 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#12 0x00007fffe03fc252 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#13 0x00007fffe0401ec4 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#14 0x00007fffe0670f42 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#15 0x00007fffe0673028 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#16 0x00007fffe06707c6 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#17 0x00005555555b8254 in nvinfer1::IBuilder::buildSerializedNetwork (this=0x555556154438, network=..., config=...) at ../../include/NvInfer.h:8092
#18 0x00005555555b3297 in sample::networkToEngine (build=..., sys=..., builder=..., network=..., err=...) at ../common/sampleEngines.cpp:826
#19 0x00005555555b3836 in sample::modelToEngineNetworkParserTuple (model=..., build=..., sys=..., err=...) at ../common/sampleEngines.cpp:876
#20 0x00005555555b46db in sample::getEngineNetworkParserTuple (model=..., build=..., sys=..., err=...) at ../common/sampleEngines.cpp:1003
#21 0x000055555555d792 in main (argc=6, argv=0x7fffffffdb68) at trtexec.cpp:152

Please give me a hand on it. Thank you.

Lanny

spolisetty · September 15, 2021, 2:47pm

Hi @lannyyip1,

We could reproduce similar error using trtexec command, Please allow us sometime to get back on this.

Thank you.

spolisetty · September 16, 2021, 4:53am

Hi,

Looks like we are facing different error when we tried trtexec command. Is input size of the Resize_151 along axis 3 zero in your ONNX model ?

Error:

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/my_data/model_189181.onnx --verbose

[09/15/2021-16:28:49] [E] Error[2]: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 2: Internal Error (Resize_151: IResizeLayer requires that if input dimension is zero, output dimension must be zero too (axis = 3 input dimension = 0 output dimension = 1)
)
[09/15/2021-16:28:49] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

Thank you.

lannyyip1 · September 17, 2021, 1:39am

Thank you for your response.
Please add options “–optShapes=input0:1x1x1024x500” as there is some restriction when I converted model to onnx.
I run following cmd, there is no segment fault shown:

./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --verbose

But when I added --fp16 argument, as following show, the segment fault shows up.

./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --verbose

I used gdb to run, confirm that it is the same segment fault as mention in my first post.
It seems related to fp16 option.

Lanny

spolisetty · September 19, 2021, 6:03pm

Hi,

When we tried following command could successfully build engine.
./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --verbose

Could you please share us complete error verbose logs of trtexec command.

Thank you.

1182719648 · March 6, 2023, 7:39am

Have you solved it? I’m facing the same problem, thanks a lot for any advice.

Topic		Replies	Views
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	963	September 29, 2022
Cannot serialize ONNX model on TensorRT 8 TensorRT	3	1451	May 26, 2021
TensorRT 7.1.3.4 Segmentation fault on Simple ONNX model TensorRT tensorrt	3	1134	July 28, 2020
TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.) TensorRT cudnn	3	360	March 27, 2025
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1077	October 20, 2023
Segmentation fault (core dumped) after run IExecutionContext.execute_async_v3() TensorRT cudnn	2	26	March 31, 2025
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1293	October 12, 2021
Pytorch -> onnx -> tensorrt (trtexec) _for deeplabv3 TensorRT tensorrt	5	3052	April 20, 2020
TensorRT does not see all GPU memory TensorRT	1	1004	November 18, 2022
Segmentation fault with multithreaded engine build TensorRT	21	4190	December 24, 2021

Segment Fault due to strlen-avx2.S missing

Description

Environment

Relevant Files

Steps To Reproduce

Related topics