Segmentation fault when I use the static library(nvinfer_static) to load the FP16 model

baike93 · January 26, 2022, 6:31am

Description

when I used the static library(libnvinfer_static.a), segmentation fault occurs while load the FP16 engine. However, loading the FP32 or INT8 engine was fine and get correct inference.
When I used dynamic libraries(libnvinfer.so), load FP32/FP16/INT8 engine was fine and inference correctly.

Here’s the stack when it crashes：

#0  0x00000000012da349 in nvinfer1::rt::task::CaskConvolutionRunner::getShader(nvinfer1::rt::CommonContext const&) const ()
#1  0x00000000012dad16 in nvinfer1::rt::task::CaskConvolutionRunner::isCaskGroupConvShader(nvinfer1::rt::CommonContext const&) const ()
#2  0x00000000012de6ad in nvinfer1::rt::task::CaskConvolutionRunner::allocateResources(nvinfer1::rt::CommonContext const&) ()
#3  0x0000000000cd413d in nvinfer1::rt::Engine::initialize() ()
#4  0x0000000000cd639a in nvinfer1::rt::Engine::deserialize(void const*, unsigned long, nvinfer1::IGpuAllocator&) ()
#5  0x0000000000cc8dac in nvinfer1::Runtime::deserializeCudaEngine(void const*, unsigned long, nvinfer1::IPluginFactory*) ()

Environment

TensorRT Version: 8.2.1.8
GPU Type: Tesla T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 8.2.4
Operating System + Version: CentOS Linux release 7.6.1810
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

onnx model:https://drive.google.com/file/d/1sl5I4mAHqluAp1AzJP8Jemigjh5VdzDW/view?usp=sharing
tensorrt fp16 engine:https://drive.google.com/file/d/1JqrEgc6U-_0VIN-uX1C3tJmy7ln8jg1w/view?usp=sharing

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · January 26, 2022, 7:08am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

baike93 · January 26, 2022, 7:19am

when i use trtexec command, it works fine. Because trtexec link dynamic librarys. As following:

The error I face is when my executable linked the static library(libnvinfer_static.a). When using dynamic libraries(libnvinfer.so), the problem goes away. But now, I have to use static libraries. Hope for help.

spolisetty · January 27, 2022, 3:45pm

Hi,

Please check following similar issues, which may help you.

github.com/NVIDIA/TensorRT

Deserializing model results in segmentation fault

opened 04:45AM - 18 Sep 20 UTC

closed 05:19AM - 25 Sep 20 UTC

Bidski

## Description I serialise a model using ``` std::ofstream ofs(engine_fil…e, std::ios::out | std::ios::binary); std::shared_ptr<nvinfer1::IHostMemory> serialised_model(engine->serialize(), [](nvinfer1::IHostMemory* ptr) { if (ptr) { ptr->destroy(); } }); ofs.write(reinterpret_cast<char*>(serialised_model->data()), serialised_model->size()); ofs.close(); ``` and then deserialize it using ``` initLibNvInferPlugins(&logger.getTRTLogger(), ""); auto runtime = std::shared_ptr<nvinfer1::IRuntime> nvinfer1::createInferRuntime(logger.getTRTLogger()), [](nvinfer1::IRuntime* ptr) { if (ptr) { ptr->destroy(); } }); if (!runtime) { throw std::runtime_error("Failed to create TensorRT inference runtime"); } std::string engine_data = loadFromFile(engine_file); engine = std::shared_ptr<nvinfer1::ICudaEngine>( runtime->deserializeCudaEngine(engine_data.data(), engine_data.size(), nullptr), [](nvinfer1::ICudaEngine* ptr) { if (ptr) { ptr->destroy(); } }); if (!engine) { throw std::runtime_error("TensorRT inference runtime failed to deserialise engine"); } ``` When I use the deserialized engine and allocate a buffer using the [BufferManager](https://github.com/NVIDIA/TensorRT/blob/master/samples/common/buffers.h#L236) class, I get a segmentation fault when trying to copy data to the input buffer. ``` std::vector<uint8_t> image; /// populate image vector BufferManager managed_buffer(engine, 1, context.get()); float* host_data = static_cast<float*> managed_buffer.getHostBuffer(config.input_tensors[0])); std::transform(image.begin(), image.end(), host_data, [](const uint8_t& val) { return (2.0f / 255.0f) * float(val) - 1.0f; }); ``` However, if I build the engine from a UFF file (same as [this procedure](https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L124-L165)) the data transfer to the input buffer succeeds without issue. The serialized engine is being deserialized on the same machine that serialized it. Deserialization and inference are happening in separate threads. When using the engine built from the UFF model, engine creation and inference are also run in separate threads. ## Environment **TensorRT Version**: 7.0.0.11 **GPU Type**: GeForce GTX 1080 **Nvidia Driver Version**: 440.82 **CUDA Version**: 10.2 **CUDNN Version**: 7.6 **Operating System + Version**: Arch Linux, kernel version 5.6.10-arch1-1 **Python Version (if applicable)**: N/A **TensorFlow Version (if applicable)**: N/A **PyTorch Version (if applicable)**: N/A **Baremetal or Container (which commit + image + tag)**: Baremetal ## Relevant Files UFF file is generated from the samples (SampleUffSSD) provided in TensorRT 7.0.0.11 ## Steps To Reproduce 1. Load UFF model and build the engine following the procedure in [SampleUffSSD](https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L124-L165) 2. Serialize the built engine using the above code 3. Deserialize the engine using the above code 4. Run inference on the deserialized engine, as per SampleUffSSD. Perform deserialization and inference in separate threads (not sure if this is pertinent to reproducing the error).

Please make sure suggestions as mentioned above.

Thank you.

adit_bhrgv · April 28, 2022, 1:27pm

Hello baike93,

I also face the same issue while linking with static libraries. Have you found a solution ?

THanks

adit_bhrgv · April 28, 2022, 2:40pm

@spolisetty Do you have any update on this?

baike93 · April 29, 2022, 2:00am

I solved the problem by converting the model with an executable program built with a static library instead of using trtexec, and then setting the config for fp16 in the code.

Topic		Replies	Views
Segmentation fault occurs at deserializeCudaEngine TensorRT	7	2024	October 12, 2021
Segment Fault due to strlen-avx2.S missing TensorRT	5	2037	March 6, 2023
ERROR: runtime->deserializeCudaEngine build a engine ,report error "Serialization assertion sizeRead == static_cast<uint64_t>(mEnd - mCurrent) failed" TensorRT	2	557	October 28, 2022
Can not load tensor RT Model Jetson AGX Xavier	4	548	October 18, 2021
Error loading .trt model Jetson AGX Orin tensorrt	7	380	November 6, 2024
segmentation fault when using deserializeCudaEngine in C++ api TensorRT	2	1104	August 15, 2019
Engine.create_execution_context() is resulting in segmentation fault Jetson Orin Nano tensorrt	9	1443	October 26, 2023
Segmentation fault when running build_serialized_network or deserialize_cuda_engine for both trt and onnx TensorRT	2	465	February 29, 2024
Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7 while converting tf to trt DeepStream SDK	4	7333	August 25, 2022
Error loading engine, deserialize_cuda_engine generates Segmentation fault (core dumped) TensorRT	4	1580	June 18, 2020