TensorRT 7.1.0 DP segfault when deserailizing the "PriorBox" plugin

jkjung13 · May 21, 2020, 2:55am

I have previously reported this same issue here, as well as on GitHub. Our company does use this “PriorBox” TensorRT plugin in currently shipping products, so we would need this fixed in JetPack-4.4 GA.

Description

Test with TensorRT 7.1.0 DP on Jetson Nano DevKit. Use “trtexec” to save a TensorRT engine from the original Caffe Single-Shot Multibox Detector (SSD_300x300) model. Then use “trtexec” again to load the engine. “trtexec” crashes with segmentation fault. Backtrace analysis in gdb shows the crash is caused by deserialization of the “PriorBox” plugin.

The same test worked when using TensorRT 6 (JetPack-4.3). The segfault problem is only reproduced with TensorRT 7.1.0 DP (JetPack-4.4).

Environment

TensorRT Version : 7.1.0 [Developer Preview]
GPU Type : Jetson Nano
Nvidia Driver Version : JetPack-4.4 DP (L4T R32.4.2)
CUDA Version : 10.2
CUDNN Version : 8.0.0 [Develop Preview]
Operating System + Version : Ubuntu 18.04, Linux kernel 4.9.140
Python Version (if applicable) : 3.6.9
Baremetal or Container (if container which image + tag) : Baremetal

Steps To Reproduce

Download the COCO SSD300 model from the original (weiliu89) SSD Caffe repository. More specifically, download this models_VGGNet_coco_SSD_300x300.tar.gz file. After decompressing it, you should be able to find these 2 files: “deploy.prototxt” and “VGG_coco_SSD_300x300_iter_400000.caffemodel”.
In “deploy.prototxt”, replace unsupported layers with something that TensorRT Caffe parser could handle.
- Replace all “Flatten” layers by “Reshape” layers with the following parameters.
```
  reshape_param {
    shape {
      dim: 0
      dim: -1
      dim: 1
      dim: 1
    }
  }
```
- In the final “DetectionOutput” layer, add one more output and name it “keep_count”.
```
layer {
  name: "detection_out"
  type: "DetectionOutput"
  ......
  top: "detection_out"
+ top: "keep_count"
  ......
```
Here is a copy of the “deploy.prototxt” after the above-mentioned modifications: deploy.prototxt.txt

Use “trtexec” to generate the TensorRT engine. You could see that the TensorRT engine could be generated and profiled (inference) without problem.

$ cd SSD_300x300
$ /usr/src/tensorrt/bin/trtexec \
    --deploy=deploy.prototxt \
    --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
    --output=detection_out \
    --workspace=256 \
    --fp16 \
    --saveEngine=deploy.engine \
    --dumpProfile

Next, use “trtexec” to load the TensorRT engine (using the “–loadEngine” option as shown below). It would crash when trying to deserialize the engine.

$ /usr/src/tensorrt/bin/trtexec \
    --deploy=deploy.prototxt \
    --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
    --output=detection_out \
    --workspace=256 \
    --fp16 \
    --loadEngine=deploy.engine \
    --dumpProfile

Results:

[05/19/2020-17:40:53] [V] [TRT] Deserialize required 5471870 microseconds.
Segmentation fault (core dumped)

Use gdb to analyze the core dump. You could see that the code crashed at a constructor of the “PriorBox” plugin.

$ gdb trtexec core
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show 
copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from trtexec...done.
[New LWP 17555]
[New LWP 17566]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `./trtexec --deploy=SSD_300x300/deploy.prototxt --model=SSD_300x300/VGG_coco_SSD'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
[Current thread is 1 (Thread 0x7fb069a910 (LWP 17555))]
(gdb) bt
#0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
#1  0x0000007fa29fdc10 in nvinfer1::plugin::PriorBox::clone() const () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
#2  0x0000007fa37e6138 in nvinfer1::rt::SafeExecutionContext::SafeExecutionContext(nvinfer1::rt::SafeEngine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#3  0x0000007fa3574fac in nvinfer1::rt::ExecutionContext::ExecutionContext(nvinfer1::rt::Engine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#4  0x0000007fa35758d8 in nvinfer1::rt::Engine::createExecutionContext() () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#5  0x00000055635e28d4 in sample::setUpInference (iEnv=..., inference=...) at ../common/sampleInference.cpp:44
#6  0x00000055635dbff8 in main ()
(gdb)

AastaLLL · May 21, 2020, 6:24am

Hi,

Thanks for your question.
We are trying to reproduce this issue. Will update later.

Thanks.

AastaLLL · May 22, 2020, 10:53am

Hi,

We can reproduce this issue internally.
And found out this issue is actually caused by the deseralizer in normalizePlugin.cpp

github.com

NVIDIA/TensorRT/blob/release/7.0/plugin/normalizePlugin/normalizePlugin.cpp#L76


      
              const char *d = reinterpret_cast<const char*>(buffer), *a = d;
              C = read<int>(d);
              H = read<int>(d);
              W = read<int>(d);
              acrossSpatial = read<bool>(d);
              channelShared = read<bool>(d);
              eps = read<float>(d);
          
          
    mNbWeights = read<int>(d);
              mWeights = deserializeToDevice(d, mNbWeights);
              cublasCreate(&mCublas);
              ASSERT(d == a + length);
          }
          
          
int Normalize::getNbOutputs() const
          {
              // Plugin layer has 1 output
              return 1;
          }
          
          
Dims Normalize::getOutputDimensions(int index, const Dims* inputs, int nbInputDims)

We are checking this issue with our internal team.
Will share more information with you later.

Thanks.

jkjung13 · May 22, 2020, 11:12am

@AastaLLL This is a very encouraging update for us. Thanks.

AastaLLL · June 3, 2020, 5:03am

Hi,

We are still checking this issue.
Will keep you updated.

Thanks.

jkjung13 · June 3, 2020, 12:27pm

Noted and thanks.

ersheng · June 4, 2020, 10:47am

@AastaLLL

The piece of code you mentioned is from

Normalize::Normalize(const void* data, size_t length)

But the customer’s core dump is actually from

PriorBox::PriorBox(PriorBoxParameters param, int H, int W)

@jkjung13

See the attached priorBoxPlugin.cpp (17.8 KB) for the solution.

Replace the old priorBoxPlugin.cpp with the new one.
You also have to add an extra member named bool mOwnsParamMemory; in the header file.

ersheng · June 5, 2020, 10:10am

@AastaLLL

Now I agree with you that this core dump may be another bug from Normalize plugin alongside with the PriorBox bug.

So, there are 2 bugs that need to be fixed.

The normalize core dump is directly caused by ASSERT(nbWeights == 1) when Normalize tries to clone() itself:

Normalize::Normalize(
    const Weights* weights, int nbWeights, bool acrossSpatial, bool channelShared, float eps, int C, int H, int W)
    : acrossSpatial(acrossSpatial)
    , channelShared(channelShared)
    , eps(eps)
    , C(C)
    , H(H)
    , W(W)
{
    mNbWeights = nbWeights;
    ASSERT(nbWeights == 1);
    ASSERT(weights[0].count >= 1);
    mWeights = copyToDevice(weights[0].values, weights[0].count);
    cublasCreate(&mCublas);
}

This assertion failure is probably caused by another constructor:

Normalize::Normalize(const void* buffer, size_t length)
{
    const char *d = reinterpret_cast<const char*>(buffer), *a = d;
    C = read<int>(d);
    H = read<int>(d);
    W = read<int>(d);
    acrossSpatial = read<bool>(d);
    channelShared = read<bool>(d);
    eps = read<float>(d);

    mNbWeights = read<int>(d);
    mWeights = deserializeToDevice(d, mNbWeights);
    cublasCreate(&mCublas);
    ASSERT(d == a + length);
}

I think implementation of this constructor is wrong.
This core dump may disappear if you modify this constructor this way:

Normalize::Normalize(const void* buffer, size_t length)
{
    const char *d = reinterpret_cast<const char*>(buffer), *a = d;
    C = read<int>(d);
    H = read<int>(d);
    W = read<int>(d);
    acrossSpatial = read<bool>(d);
    channelShared = read<bool>(d);
    eps = read<float>(d);

    mNbWeights = 1;
    // mNbWeights = read<int>(d);
    int count = read<int>(d);
    // mWeights = deserializeToDevice(d, mNbWeights);
    mWeights = deserializeToDevice(d, count);
    cublasCreate(&mCublas);
    ASSERT(d == a + length);
}

jkjung13 · June 8, 2020, 12:04pm

@AastaLLL & @ersheng, thanks a lot. I have verified that the modified priorBoxPlugin and normalizePlugin code does solve the segfault problem.

I’m wondering if this fix would make it into JetPack-4.4 (TensorRT 7.1.0) GA release?

ersheng · June 9, 2020, 2:24am

@jkjung13

Approximately end of this month would be JP-4.4 GA with fixes

Topic		Replies	Views
Error calling the interface of createSSDPriorBoxPlugin when prase my caffe model TensorRT	4	730	January 3, 2019
Problem deserializing custom plugin on Jetson Nano TensorRT tensorrt , jetson-inference	5	3089	June 12, 2020
Problem deserializing TensorRT custom plugin on Jetson Nano Jetson Nano tensorrt	6	1263	October 18, 2021
TensorRT 3 TX2 SSD crashed with Segmentation fault Jetson TX2	3	1469	October 18, 2021
Xavier NX Jetpack 4.4 GA TRT gives wrong results for a specific caffe model Jetson Xavier NX tensorrt	5	683	October 18, 2021
Segnet/poseNet Segmentation fault (core dumped) for Jetpack6.2 Jetson Orin Nano jetson-inference , cudnn	4	199	February 17, 2025
cannot deserialize engine and segmentation fault(core dumped) Jetson TX2	2	2219	October 18, 2021
problem adding custom TensorRT layer to a network defined using TensorRT API TensorRT	5	1534	May 15, 2018
crash when converting onnx ReID model to tensorrt TensorRT	14	2156	October 12, 2021
TensorRt on windows error when loading in deserialized custom plugins TensorRT tensorrt	5	980	March 16, 2021

TensorRT 7.1.0 DP segfault when deserailizing the "PriorBox" plugin

Description

Environment

Steps To Reproduce

Related topics