Segment Fault due to strlen-avx2.S missing

Description

When I try to convert ONNX modal to TensorRT Engine with TensorRT 8.0.3.4, a segment fault would be reported.

Environment

TensorRT Version: 8.0.3.4
GPU Type: RTX 3070
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.1 but complained that 8.1.1 is used
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):

Relevant Files

(ONNX Model)[share/model.onnx at master · lannyyip/share · GitHub]: share/model.onnx at master · lannyyip/share · GitHub

Steps To Reproduce

  1. build trtexec from sample directory, to get trtexec_debug
  2. load trtexec_debug with gdb, with following command:
gdb --args trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin
  1. run and get the result
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from trtexec_debug...
(gdb) run
Starting program: /home/lanny/tools/TensorRT-8.0.3.4/targets/x86_64-linux-gnu/bin/trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
&&&& RUNNING TensorRT.trtexec [TensorRT v8003] # /home/lanny/tools/TensorRT-8.0.3.4/targets/x86_64-linux-gnu/bin/trtexec_debug --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --workspace=5000 --saveEngine=model.bin
[09/14/2021-12:39:41] [I] === Model Options ===
[09/14/2021-12:39:41] [I] Format: ONNX
[09/14/2021-12:39:41] [I] Model: model.onnx
[09/14/2021-12:39:41] [I] Output:
[09/14/2021-12:39:41] [I] === Build Options ===
[09/14/2021-12:39:41] [I] Max batch: explicit
[09/14/2021-12:39:41] [I] Workspace: 5000 MiB
[09/14/2021-12:39:41] [I] minTiming: 1
[09/14/2021-12:39:41] [I] avgTiming: 8
[09/14/2021-12:39:41] [I] Precision: FP32+FP16
[09/14/2021-12:39:41] [I] Calibration: 
[09/14/2021-12:39:41] [I] Refit: Disabled
[09/14/2021-12:39:41] [I] Sparsity: Disabled
[09/14/2021-12:39:41] [I] Safe mode: Disabled
[09/14/2021-12:39:41] [I] Restricted mode: Disabled
[09/14/2021-12:39:41] [I] Save engine: model.bin
[09/14/2021-12:39:41] [I] Load engine: 
[09/14/2021-12:39:41] [I] NVTX verbosity: 0
[09/14/2021-12:39:41] [I] Tactic sources: Using default tactic sources
[09/14/2021-12:39:41] [I] timingCacheMode: local
[09/14/2021-12:39:41] [I] timingCacheFile: 
[09/14/2021-12:39:41] [I] Input(s)s format: fp32:CHW
[09/14/2021-12:39:41] [I] Output(s)s format: fp32:CHW
[09/14/2021-12:39:41] [I] Input build shape: input0=1x1x1024x500+1x1x1024x500+1x1x1024x500
[09/14/2021-12:39:41] [I] Input calibration shapes: model
[09/14/2021-12:39:41] [I] === System Options ===
[09/14/2021-12:39:41] [I] Device: 0
[09/14/2021-12:39:41] [I] DLACore: 
[09/14/2021-12:39:41] [I] Plugins:
[09/14/2021-12:39:41] [I] === Inference Options ===
[09/14/2021-12:39:41] [I] Batch: Explicit
[09/14/2021-12:39:41] [I] Input inference shape: input0=1x1x1024x500
[09/14/2021-12:39:41] [I] Iterations: 10
[09/14/2021-12:39:41] [I] Duration: 3s (+ 200ms warm up)
[09/14/2021-12:39:41] [I] Sleep time: 0ms
[09/14/2021-12:39:41] [I] Streams: 1
[09/14/2021-12:39:41] [I] ExposeDMA: Disabled
[09/14/2021-12:39:41] [I] Data transfers: Enabled
[09/14/2021-12:39:41] [I] Spin-wait: Disabled
[09/14/2021-12:39:41] [I] Multithreading: Disabled
[09/14/2021-12:39:41] [I] CUDA Graph: Disabled
[09/14/2021-12:39:41] [I] Separate profiling: Disabled
[09/14/2021-12:39:41] [I] Time Deserialize: Disabled
[09/14/2021-12:39:41] [I] Time Refit: Disabled
[09/14/2021-12:39:41] [I] Skip inference: Disabled
[09/14/2021-12:39:41] [I] Inputs:
[09/14/2021-12:39:41] [I] === Reporting Options ===
[09/14/2021-12:39:41] [I] Verbose: Disabled
[09/14/2021-12:39:41] [I] Averages: 10 inferences
[09/14/2021-12:39:41] [I] Percentile: 99
[09/14/2021-12:39:41] [I] Dump refittable layers:Disabled
[09/14/2021-12:39:41] [I] Dump output: Disabled
[09/14/2021-12:39:41] [I] Profile: Disabled
[09/14/2021-12:39:41] [I] Export timing to JSON file: 
[09/14/2021-12:39:41] [I] Export output to JSON file: 
[09/14/2021-12:39:41] [I] Export profile to JSON file: 
[09/14/2021-12:39:41] [I] 
[New Thread 0x7fffbf7e7000 (LWP 3333308)]
[09/14/2021-12:39:41] [I] === Device Information ===
[09/14/2021-12:39:41] [I] Selected Device: NVIDIA GeForce RTX 3070
[09/14/2021-12:39:41] [I] Compute Capability: 8.6
[09/14/2021-12:39:41] [I] SMs: 46
[09/14/2021-12:39:41] [I] Compute Clock Rate: 1.755 GHz
[09/14/2021-12:39:41] [I] Device Global Memory: 7973 MiB
[09/14/2021-12:39:41] [I] Shared Memory per SM: 100 KiB
[09/14/2021-12:39:41] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/14/2021-12:39:41] [I] Memory Clock Rate: 7.001 GHz
[09/14/2021-12:39:41] [I] 
[09/14/2021-12:39:41] [I] TensorRT version: 8003
[New Thread 0x7fffbefe6000 (LWP 3333309)]
[New Thread 0x7fffbe61d000 (LWP 3333310)]
[09/14/2021-12:39:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +532, GPU +0, now: CPU 539, GPU 408 (MiB)
[09/14/2021-12:39:41] [I] Start parsing network model
[09/14/2021-12:39:41] [I] [TRT] ----------------------------------------------------------------
[09/14/2021-12:39:41] [I] [TRT] Input filename:   model.onnx
[09/14/2021-12:39:41] [I] [TRT] ONNX IR version:  0.0.6
[09/14/2021-12:39:41] [I] [TRT] Opset version:    11
[09/14/2021-12:39:41] [I] [TRT] Producer name:    pytorch
[09/14/2021-12:39:41] [I] [TRT] Producer version: 1.7
[09/14/2021-12:39:41] [I] [TRT] Domain:           
[09/14/2021-12:39:41] [I] [TRT] Model version:    0
[09/14/2021-12:39:41] [I] [TRT] Doc string:       
[09/14/2021-12:39:41] [I] [TRT] ----------------------------------------------------------------
[09/14/2021-12:39:41] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/14/2021-12:39:42] [I] Finish parsing network model
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1223, GPU 831 (MiB)
[09/14/2021-12:39:42] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 1223 MiB, GPU 831 MiB
[09/14/2021-12:39:42] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +352, GPU +270, now: CPU 1578, GPU 1255 (MiB)
[09/14/2021-12:39:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1578, GPU 1265 (MiB)
[09/14/2021-12:39:42] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1
[09/14/2021-12:39:42] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[New Thread 0x7fffbdb11000 (LWP 3333311)]

Thread 1 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65      ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007fffdcbf65b4 in __GI__IO_puts (str=0x0) at ioputs.c:35
#2  0x00007fffe13ccc40 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#3  0x00007fffe1cf7eac in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#4  0x00007fffe05545d9 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#5  0x00007fffe055e3ce in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#6  0x00007fffe061ad8b in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#7  0x00007fffe05b1b21 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#8  0x00007fffe03e444c in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#9  0x00007fffe0600309 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#10 0x00007fffe044acec in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#11 0x00007fffe044c8b6 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#12 0x00007fffe03fc252 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#13 0x00007fffe0401ec4 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#14 0x00007fffe0670f42 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#15 0x00007fffe0673028 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#16 0x00007fffe06707c6 in ?? () from /home/lanny/tools/TensorRT-8.0.3.4//lib/libnvinfer.so.8
#17 0x00005555555b8254 in nvinfer1::IBuilder::buildSerializedNetwork (this=0x555556154438, network=..., config=...) at ../../include/NvInfer.h:8092
#18 0x00005555555b3297 in sample::networkToEngine (build=..., sys=..., builder=..., network=..., err=...) at ../common/sampleEngines.cpp:826
#19 0x00005555555b3836 in sample::modelToEngineNetworkParserTuple (model=..., build=..., sys=..., err=...) at ../common/sampleEngines.cpp:876
#20 0x00005555555b46db in sample::getEngineNetworkParserTuple (model=..., build=..., sys=..., err=...) at ../common/sampleEngines.cpp:1003
#21 0x000055555555d792 in main (argc=6, argv=0x7fffffffdb68) at trtexec.cpp:152

Please give me a hand on it. Thank you.

Lanny

Hi @lannyyip1,

We could reproduce similar error using trtexec command, Please allow us sometime to get back on this.

Thank you.

1 Like

Hi,

Looks like we are facing different error when we tried trtexec command. Is input size of the Resize_151 along axis 3 zero in your ONNX model ?

Error:

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/my_data/model_189181.onnx --verbose

[09/15/2021-16:28:49] [E] Error[2]: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 2: Internal Error (Resize_151: IResizeLayer requires that if input dimension is zero, output dimension must be zero too (axis = 3 input dimension = 0 output dimension = 1)
)
[09/15/2021-16:28:49] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

Thank you.

Thank you for your response.
Please add options “–optShapes=input0:1x1x1024x500” as there is some restriction when I converted model to onnx.
I run following cmd, there is no segment fault shown:

./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --verbose

But when I added --fp16 argument, as following show, the segment fault shows up.

./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --verbose

I used gdb to run, confirm that it is the same segment fault as mention in my first post.
It seems related to fp16 option.

Lanny

Hi,

When we tried following command could successfully build engine.
./trtexec --onnx=model.onnx --optShapes=input0:1x1x1024x500 --fp16 --verbose

Could you please share us complete error verbose logs of trtexec command.

Thank you.