Question about converting onnx quantized model to tensorrt


I am trying to convert an already quantized onnx model to tensorrt!
When I try to parse my quantized onnx network, I get the following error

In node 1 (parseGraph): UNSUPPORTED_NODE: No importer registered for op: QLinearConv.

In the list of Tensorrt supported onnx operators here, I can see that QlinearConv is not supported.

Is there any guideline on how to convert quantized onnx model to trt?


TensorRT Version:
GPU Type: T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7605
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.6.0

Steps To Reproduce

    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = max_workspace_size
        builder.max_batch_size = 1

        if not parser.parse(model):
            for i in range(parser.num_errors):
                error = parser.get_error(i)

and the network is a quantized onnx model.

Hi @neda,
You will need to add custom plugin for the unsupported layer.
Please refer to the example below.


Thank you @AakankshaS!
I am reading through the docs and it is not clear to me whether it is possible to write/implement the costume layers all in python, or some parts of the custom layer creation need to necessarily happen in C++?

I am mainly referring to this sentence You can use the C++ API to create a custom layer, package the layer using pybind11 in Python, then load the plugin into a Python application, which is in the section 4.2 Adding Custom Layers Using The Python API. So the custom layer should be defined in c++ and we need a python wrapper to use it?
I would appreciate if you could answer!