Comparison of the trtexec tool with plugins option and putting plugins in lib

Chieh · April 19, 2021, 3:31am

Description

The situation is that as I have a customized plugin and I wanna add it into my plugin library, so I built the plugins from scratch to get the .so file (libnvinfer_plugin.so). In fact, the building steps were followed by the building TensorRT-OSS of TensorRT github repository, so after I built the plugins, there were a lot of files including libnvonnxparser.so, libnvcaffeparser.so, libnvinfer_plugin.so, and trtexec tool.

What I did experiments in docker container individually is below:

Put those output built library files into /usr/lib/x86_64-linux-gnu/ to let trtexec find the library location.
This one can successfully correctly convert the onnx model to TRT model and do inference without any error.
Here is the partial output message:

 [04/19/2021-03:02:08] [I] GPU Compute
 [04/19/2021-03:02:08] [I] min: 32.7649 ms
 [04/19/2021-03:02:08] [I] max: 33.9702 ms
 [04/19/2021-03:02:08] [I] mean: 33.0306 ms
 [04/19/2021-03:02:08] [I] median: 32.9963 ms
 [04/19/2021-03:02:08] [I] percentile: 33.9702 ms at 99%
 [04/19/2021-03:02:08] [I] total compute time: 3.07185 s
 &&&& PASSED TensorRT.trtexec # trtexec --onnx=model.onnx --saveEngine=test.trt --explicitBatch --fp16

I did not put the lib into /usr/lib/x86_64-linux-gnu/ and I used the plugins option this way to convert the TRT model. It could convert to the TRT model (saved the model in the assigned folder already.) but it encountered the error of malloc_consolidate(): invalid chunk size Aborted (core dumped) during the inference step of trtexec.

 [04/19/2021-02:29:13] [I] GPU Compute
 [04/19/2021-02:29:13] [I] min: 32.6328 ms
 [04/19/2021-02:29:13] [I] max: 33.0833 ms
 [04/19/2021-02:29:13] [I] mean: 32.795 ms
 [04/19/2021-02:29:13] [I] median: 32.7783 ms
 [04/19/2021-02:29:13] [I] percentile: 33.0833 ms at 99%
 [04/19/2021-02:29:13] [I] total compute time: 3.08273 s
 &&&& PASSED TensorRT.trtexec # trtexec --onnx=model.onnx --saveEngine=test.trt --explicitBatch --fp16 --plugins=libnvinfer_plugin.so
 malloc_consolidate(): invalid chunk size
 Aborted (core dumped)

It was quite inconvenient to put those libraries into /usr/lib/x86_64-linux-gnu/ folder every time.
According to the first experiment, it could prove the plugin which was able to work correctly.
I wonder how can I use the trtexec tool with --plugins option to convert the model?

Environment

TensorRT Version: 7.2
GPU Type: GeForce RTX 3060
Nvidia Driver Version: 460.56
CUDA Version: 11.2
Baremetal or Container (if container which image + tag): Image was using NVIDIA Release 21.03 (build 20572684)
docker pull nvcr.io/nvidia/tensorrt:21.03-py3

If you request any further information about my description, please let me know~
Thank you!!

Best regards,
Chieh

NVES · April 19, 2021, 3:37am

Hi,
Please refer to below links related custom plugin implementation and sample:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleOnnxMnistCoordConvAC

Thanks!

Chieh · April 19, 2021, 8:02am

Dear NVES,

Thanks for your information and sharing!!

I thought that my problem was caused by the library path which the trtexec tool cannot load correctly and read completely because the plugin was working well that I mentioned in the beginning.

Here were my brief conclusions after I did various experiments about putting the libraries in different locations.

the critical files are libnvinfer_plugin.so and libnvcaffeparser.so for my cases, so if you wanna use your customized plugin, you have to let your trtexec correctly load this libnvinfer_plugin.so file.
These files do not have to put in the /usr/lib/x86_64-linux-gnu/ if you can set the proper path before you run. The advantage reason to put in there is that the tool will implement in that folder by the environment default setting.
If I use the plugin option of trtexec with libnvinfer_plugin.so .so file only, then it will encounter that problem what I mentioned (i.e., abort).
The library will load the first one even you append the library path in the LD_LIBRARY_PATH, it will still raise the same error. Hence, you have to add your library path in the beginning of LD_LIBRARY_PATH.

It doesn’t matter where you put these libraries. The point is “How” to set the library path.

I will try more and share on here if I get further results!

Thanks!