Segmentation fault when I use the static library(nvinfer_static) to load the FP16 model

Description

when I used the static library(libnvinfer_static.a), segmentation fault occurs while load the FP16 engine. However, loading the FP32 or INT8 engine was fine and get correct inference.
When I used dynamic libraries(libnvinfer.so), load FP32/FP16/INT8 engine was fine and inference correctly.

Here’s the stack when it crashes:

#0  0x00000000012da349 in nvinfer1::rt::task::CaskConvolutionRunner::getShader(nvinfer1::rt::CommonContext const&) const ()
#1  0x00000000012dad16 in nvinfer1::rt::task::CaskConvolutionRunner::isCaskGroupConvShader(nvinfer1::rt::CommonContext const&) const ()
#2  0x00000000012de6ad in nvinfer1::rt::task::CaskConvolutionRunner::allocateResources(nvinfer1::rt::CommonContext const&) ()
#3  0x0000000000cd413d in nvinfer1::rt::Engine::initialize() ()
#4  0x0000000000cd639a in nvinfer1::rt::Engine::deserialize(void const*, unsigned long, nvinfer1::IGpuAllocator&) ()
#5  0x0000000000cc8dac in nvinfer1::Runtime::deserializeCudaEngine(void const*, unsigned long, nvinfer1::IPluginFactory*) ()

Environment

TensorRT Version: 8.2.1.8
GPU Type: Tesla T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 8.2.4
Operating System + Version: CentOS Linux release 7.6.1810
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

onnx model:espcn_2x_mytrain256.onnx - Google Drive
tensorrt fp16 engine:espcn_2x_ori_trt8.2_fp16.engine - Google Drive

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

when i use trtexec command, it works fine. Because trtexec link dynamic librarys. As following:

The error I face is when my executable linked the static library(libnvinfer_static.a). When using dynamic libraries(libnvinfer.so), the problem goes away. But now, I have to use static libraries. Hope for help.

Hi,

Please check following similar issues, which may help you.

Please make sure suggestions as mentioned above.

Thank you.

Hello baike93,

I also face the same issue while linking with static libraries. Have you found a solution ?

THanks

@spolisetty Do you have any update on this?

I solved the problem by converting the model with an executable program built with a static library instead of using trtexec, and then setting the config for fp16 in the code.