Is there a no way to get quantized weights after calibration?

Hello, I am studying about TensorRT.
I wanna make custom layers(using IPluginV2IOExt and CUDA kernel) and do INT8 quantization in the model.
So, I made a simple and same convolution layer by using CUDA kernel.
However, when I added a IInt8EntropyCalibrator2 and do Int8quantization, I realized that there is a no way to give the custom layers quantized weights, only input data and output data.

To be brief,

  1. make custom layers by CUDA kernel and IPluginV2IOExt
  2. do INT8 quantization by IInt8EntropyCalibrator2
    but I think that there is a no way to give IPluginV2IOExt the data of weights. So, I cannot quantization with my custom layers.
    Is it impossible?

please help.

Thank you.

Hi @muger1031,
You should manage weights himself if you are using plugin to implement your custom layer
I think using trt’s IInt8EntropyCalibrator2, you can get input/output scale factor, you may have to compute int8 weights using scale factor yourself.
You can refer to the below example for the same.
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/samplePlugin/fcPlugin.h

Thanks!

1 Like