I’m exploring the TRT Quantization Toolkit. I would like to use a simple example to get things clear.
I’ll have a single Conv2d layer network with pretrained weights.
As I understood, in order to calibrate the network, I need to swap my original Conv2d layer with the QuantConv2d layer which has input and weight quantizers. After doing this I paid attention that after doing this the named_modules of the network now include 3 layers, the QuantConv2d, _input_quantizer and a _weight_quantizer.
When collecting statistics should I just source my input to the QuantConv2d or do something like described here?
TensorRT Version: 8
GPU Type: 2080 TI
Nvidia Driver Version: 470.57.02
CUDA Version: 11.3
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9