TensorRT 7.1.0 DP segfault when deserailizing the "PriorBox" plugin

I have previously reported this same issue here, as well as on GitHub. Our company does use this “PriorBox” TensorRT plugin in currently shipping products, so we would need this fixed in JetPack-4.4 GA.

Description

Test with TensorRT 7.1.0 DP on Jetson Nano DevKit. Use “trtexec” to save a TensorRT engine from the original Caffe Single-Shot Multibox Detector (SSD_300x300) model. Then use “trtexec” again to load the engine. “trtexec” crashes with segmentation fault. Backtrace analysis in gdb shows the crash is caused by deserialization of the “PriorBox” plugin.

The same test worked when using TensorRT 6 (JetPack-4.3). The segfault problem is only reproduced with TensorRT 7.1.0 DP (JetPack-4.4).

Environment

TensorRT Version : 7.1.0 [Developer Preview]
GPU Type : Jetson Nano
Nvidia Driver Version : JetPack-4.4 DP (L4T R32.4.2)
CUDA Version : 10.2
CUDNN Version : 8.0.0 [Develop Preview]
Operating System + Version : Ubuntu 18.04, Linux kernel 4.9.140
Python Version (if applicable) : 3.6.9
Baremetal or Container (if container which image + tag) : Baremetal

Steps To Reproduce

  1. Download the COCO SSD300 model from the original (weiliu89) SSD Caffe repository. More specifically, download this models_VGGNet_coco_SSD_300x300.tar.gz file. After decompressing it, you should be able to find these 2 files: “deploy.prototxt” and “VGG_coco_SSD_300x300_iter_400000.caffemodel”.

  2. In “deploy.prototxt”, replace unsupported layers with something that TensorRT Caffe parser could handle.

    • Replace all “Flatten” layers by “Reshape” layers with the following parameters.
      reshape_param {
        shape {
          dim: 0
          dim: -1
          dim: 1
          dim: 1
        }
      }
    
    • In the final “DetectionOutput” layer, add one more output and name it “keep_count”.
    layer {
      name: "detection_out"
      type: "DetectionOutput"
      ......
      top: "detection_out"
    + top: "keep_count"
      ......
    

    Here is a copy of the “deploy.prototxt” after the above-mentioned modifications: deploy.prototxt.txt

  3. Use “trtexec” to generate the TensorRT engine. You could see that the TensorRT engine could be generated and profiled (inference) without problem.

    $ cd SSD_300x300
    $ /usr/src/tensorrt/bin/trtexec \
        --deploy=deploy.prototxt \
        --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
        --output=detection_out \
        --workspace=256 \
        --fp16 \
        --saveEngine=deploy.engine \
        --dumpProfile
    
  4. Next, use “trtexec” to load the TensorRT engine (using the “–loadEngine” option as shown below). It would crash when trying to deserialize the engine.

    $ /usr/src/tensorrt/bin/trtexec \
        --deploy=deploy.prototxt \
        --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
        --output=detection_out \
        --workspace=256 \
        --fp16 \
        --loadEngine=deploy.engine \
        --dumpProfile
    

    Results:

    [05/19/2020-17:40:53] [V] [TRT] Deserialize required 5471870 microseconds.
    Segmentation fault (core dumped)
    
  5. Use gdb to analyze the core dump. You could see that the code crashed at a constructor of the “PriorBox” plugin.

    $ gdb trtexec core
    GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
    <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show 
    copying"
    and "show warranty" for details.
    This GDB was configured as "aarch64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from trtexec...done.
    [New LWP 17555]
    [New LWP 17566]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
    Core was generated by `./trtexec --deploy=SSD_300x300/deploy.prototxt --model=SSD_300x300/VGG_coco_SSD'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    [Current thread is 1 (Thread 0x7fb069a910 (LWP 17555))]
    (gdb) bt
    #0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    #1  0x0000007fa29fdc10 in nvinfer1::plugin::PriorBox::clone() const () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    #2  0x0000007fa37e6138 in nvinfer1::rt::SafeExecutionContext::SafeExecutionContext(nvinfer1::rt::SafeEngine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #3  0x0000007fa3574fac in nvinfer1::rt::ExecutionContext::ExecutionContext(nvinfer1::rt::Engine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #4  0x0000007fa35758d8 in nvinfer1::rt::Engine::createExecutionContext() () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #5  0x00000055635e28d4 in sample::setUpInference (iEnv=..., inference=...) at ../common/sampleInference.cpp:44
    #6  0x00000055635dbff8 in main ()
    (gdb)
    

Hi,

Thanks for your question.
We are trying to reproduce this issue. Will update later.

Thanks.

Hi,

We can reproduce this issue internally.
And found out this issue is actually caused by the deseralizer in normalizePlugin.cpp

We are checking this issue with our internal team.
Will share more information with you later.

Thanks.

@AastaLLL This is a very encouraging update for us. Thanks.

Hi,

We are still checking this issue.
Will keep you updated.

Thanks.

Noted and thanks.

@AastaLLL

The piece of code you mentioned is from

Normalize::Normalize(const void* data, size_t length)

But the customer’s core dump is actually from

PriorBox::PriorBox(PriorBoxParameters param, int H, int W)

@jkjung13

See the attached priorBoxPlugin.cpp (17.8 KB) for the solution.

Replace the old priorBoxPlugin.cpp with the new one.
You also have to add an extra member named bool mOwnsParamMemory; in the header file.

@AastaLLL

Now I agree with you that this core dump may be another bug from Normalize plugin alongside with the PriorBox bug.

So, there are 2 bugs that need to be fixed.

The normalize core dump is directly caused by ASSERT(nbWeights == 1) when Normalize tries to clone() itself:

Normalize::Normalize(
    const Weights* weights, int nbWeights, bool acrossSpatial, bool channelShared, float eps, int C, int H, int W)
    : acrossSpatial(acrossSpatial)
    , channelShared(channelShared)
    , eps(eps)
    , C(C)
    , H(H)
    , W(W)
{
    mNbWeights = nbWeights;
    ASSERT(nbWeights == 1);
    ASSERT(weights[0].count >= 1);
    mWeights = copyToDevice(weights[0].values, weights[0].count);
    cublasCreate(&mCublas);
}

This assertion failure is probably caused by another constructor:

Normalize::Normalize(const void* buffer, size_t length)
{
    const char *d = reinterpret_cast<const char*>(buffer), *a = d;
    C = read<int>(d);
    H = read<int>(d);
    W = read<int>(d);
    acrossSpatial = read<bool>(d);
    channelShared = read<bool>(d);
    eps = read<float>(d);

    mNbWeights = read<int>(d);
    mWeights = deserializeToDevice(d, mNbWeights);
    cublasCreate(&mCublas);
    ASSERT(d == a + length);
}

I think implementation of this constructor is wrong.
This core dump may disappear if you modify this constructor this way:

Normalize::Normalize(const void* buffer, size_t length)
{
    const char *d = reinterpret_cast<const char*>(buffer), *a = d;
    C = read<int>(d);
    H = read<int>(d);
    W = read<int>(d);
    acrossSpatial = read<bool>(d);
    channelShared = read<bool>(d);
    eps = read<float>(d);

    mNbWeights = 1;
    // mNbWeights = read<int>(d);
    int count = read<int>(d);
    // mWeights = deserializeToDevice(d, mNbWeights);
    mWeights = deserializeToDevice(d, count);
    cublasCreate(&mCublas);
    ASSERT(d == a + length);
}

@AastaLLL & @ersheng, thanks a lot. I have verified that the modified priorBoxPlugin and normalizePlugin code does solve the segfault problem.

I’m wondering if this fix would make it into JetPack-4.4 (TensorRT 7.1.0) GA release?

@jkjung13

Approximately end of this month would be JP-4.4 GA with fixes

1 Like