QLinearConv implementation in TensorRT and onnx model conversion


Hello, I am in the process of writing custom QLinearConv and QLinearMatMul layers in tensorrt to be able to export an already quantized model to tensorrt.

Something that I am not clear about is that when I finish writing and registering the plugins using REGISTER_TENSORRT_PLUGIN API; as per the documentation

then can I export the quantized onnx model to tensorrt like any model with supported layers?

Or it is more than that and I need to craft my network in tensorrt with this additional plugins? Basically, I mean, if I have 20 QLinearConv layers in my graph, do I need to define QLinearConv plugin 20 times, one for each, and add them to the correct place in the network and then build the network?
If so, is there a more straight forward way of supporting these layers? Like modifying onnx_trt for such a support?


TensorRT Version :
GPU Type : T4
Nvidia Driver Version : 440.33.01
CUDA Version : 10.2
CUDNN Version : 7605
Operating System + Version : Ubuntu 18.04.5 LTS
Python Version (if applicable) : 3.6.9
PyTorch Version (if applicable) : 1.6.0

Hi @neda,
Request you to check the below link for reference.