Is there a no way to get quantized weights after calibration?

Hello, I am studying about TensorRT.
I wanna make custom layers(using IPluginV2IOExt and CUDA kernel) and do INT8 quantization in the model.
So, I made a simple and same convolution layer by using CUDA kernel.
However, when I added a IInt8EntropyCalibrator2 and do Int8quantization, I realized that there is a no way to give the custom layers quantized weights, only input data and output data.

To be brief,

  1. make custom layers by CUDA kernel and IPluginV2IOExt
  2. do INT8 quantization by IInt8EntropyCalibrator2
    but I think that there is a no way to give IPluginV2IOExt the data of weights. So, I cannot quantization with my custom layers.
    Is it impossible?

please help.

Thank you.

Hi @muger1031,
You should manage weights himself if you are using plugin to implement your custom layer
I think using trt’s IInt8EntropyCalibrator2, you can get input/output scale factor, you may have to compute int8 weights using scale factor yourself.
You can refer to the below example for the same.

Thanks!

1 Like

Repo for custom_tensorrt

This is my repo for TensorRT study.

When I run PoolingPlugin+INT8Calibrator.cpp, I see the following logs and CalibrationTable.

[11/27/2020-14:34:46] [I] [TRT] Reading Calibration Cache for calibrator: EntropyCalibration2
[11/27/2020-14:34:46] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[11/27/2020-14:34:46] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: data range [-254.066,254.066]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: scale1 range [-0.996338,0.996338]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: conv1 range [-3.94256,3.94256]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: pool_pluginV2 range [-3.72873,3.72873]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: dense1 range [-55.9656,55.9656]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: relu_dense1 range [-48.4435,48.4435]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: dense2 range [-55.1735,55.1735]
[11/27/2020-14:34:46] [V] [TRT] INT8 Inference Tensor Scales: prob range [-1.00024,1.00024]

TRT-7000-EntropyCalibration2
data: 40000889
scale1: 3c008912
conv1: 3cfe4f7f
pool_pluginV2: 3cf08493
dense1: 3ee1a00b
relu_dense1: 3ec34cc8
dense2: 3ede6e7e
prob: 3c010a14

But I don’t know the way to convert my fp32 weights to quantized int8 type.
Could give me more detail explaination about CalirationTable and how to compute int8 weights using scale factor?
+) Please explain about the values in the calibrationtable and how to use the values to optain scale factors.

Thanks for replying!