I have previously reported this same issue here, as well as on GitHub. Our company does use this “PriorBox” TensorRT plugin in currently shipping products, so we would need this fixed in JetPack-4.4 GA.
Description
Test with TensorRT 7.1.0 DP on Jetson Nano DevKit. Use “trtexec” to save a TensorRT engine from the original Caffe Single-Shot Multibox Detector (SSD_300x300) model. Then use “trtexec” again to load the engine. “trtexec” crashes with segmentation fault. Backtrace analysis in gdb shows the crash is caused by deserialization of the “PriorBox” plugin.
The same test worked when using TensorRT 6 (JetPack-4.3). The segfault problem is only reproduced with TensorRT 7.1.0 DP (JetPack-4.4).
Environment
TensorRT Version : 7.1.0 [Developer Preview]
GPU Type : Jetson Nano
Nvidia Driver Version : JetPack-4.4 DP (L4T R32.4.2)
CUDA Version : 10.2
CUDNN Version : 8.0.0 [Develop Preview]
Operating System + Version : Ubuntu 18.04, Linux kernel 4.9.140
Python Version (if applicable) : 3.6.9
Baremetal or Container (if container which image + tag) : Baremetal
Steps To Reproduce
-
Download the COCO SSD300 model from the original (weiliu89) SSD Caffe repository. More specifically, download this models_VGGNet_coco_SSD_300x300.tar.gz file. After decompressing it, you should be able to find these 2 files: “deploy.prototxt” and “VGG_coco_SSD_300x300_iter_400000.caffemodel”.
-
In “deploy.prototxt”, replace unsupported layers with something that TensorRT Caffe parser could handle.
- Replace all “Flatten” layers by “Reshape” layers with the following parameters.
reshape_param { shape { dim: 0 dim: -1 dim: 1 dim: 1 } }
- In the final “DetectionOutput” layer, add one more output and name it “keep_count”.
layer { name: "detection_out" type: "DetectionOutput" ...... top: "detection_out" + top: "keep_count" ......
Here is a copy of the “deploy.prototxt” after the above-mentioned modifications: deploy.prototxt.txt
-
Use “trtexec” to generate the TensorRT engine. You could see that the TensorRT engine could be generated and profiled (inference) without problem.
$ cd SSD_300x300 $ /usr/src/tensorrt/bin/trtexec \ --deploy=deploy.prototxt \ --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \ --output=detection_out \ --workspace=256 \ --fp16 \ --saveEngine=deploy.engine \ --dumpProfile
-
Next, use “trtexec” to load the TensorRT engine (using the “–loadEngine” option as shown below). It would crash when trying to deserialize the engine.
$ /usr/src/tensorrt/bin/trtexec \ --deploy=deploy.prototxt \ --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \ --output=detection_out \ --workspace=256 \ --fp16 \ --loadEngine=deploy.engine \ --dumpProfile
Results:
[05/19/2020-17:40:53] [V] [TRT] Deserialize required 5471870 microseconds. Segmentation fault (core dumped)
-
Use gdb to analyze the core dump. You could see that the code crashed at a constructor of the “PriorBox” plugin.
$ gdb trtexec core GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from trtexec...done. [New LWP 17555] [New LWP 17566] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Core was generated by `./trtexec --deploy=SSD_300x300/deploy.prototxt --model=SSD_300x300/VGG_coco_SSD'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 [Current thread is 1 (Thread 0x7fb069a910 (LWP 17555))] (gdb) bt #0 0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 #1 0x0000007fa29fdc10 in nvinfer1::plugin::PriorBox::clone() const () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 #2 0x0000007fa37e6138 in nvinfer1::rt::SafeExecutionContext::SafeExecutionContext(nvinfer1::rt::SafeEngine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 #3 0x0000007fa3574fac in nvinfer1::rt::ExecutionContext::ExecutionContext(nvinfer1::rt::Engine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 #4 0x0000007fa35758d8 in nvinfer1::rt::Engine::createExecutionContext() () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 #5 0x00000055635e28d4 in sample::setUpInference (iEnv=..., inference=...) at ../common/sampleInference.cpp:44 #6 0x00000055635dbff8 in main () (gdb)