Description
I am running post-training quantization for a pre-trained recurrent network model. I followed the sampleINT8 example and implemented a calibrator based on the IInt8Calibrator
class (IInt8EntropyCalibrator2
to be more specific). Since my network is recurrent, I’d like to obtain the recurrent output for a given batch and use it for the next batch in getBatch()
. It appears that there’s no way to do this with the current C++ API. The work-around is to run the network as is on the calibration dataset, then store the recurrent output for each data instance. Then during calibration, I can load these in for each batch. However, this workflow is cumbersome to maintain. Is there a better way to do so?
Environment
TensorRT Version: 8.2.2.1
GPU Type: RTX 3090
Nvidia Driver Version: 515.65.01
CUDA Version: 11.6
CUDNN Version: 8.4
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
No model is needed as this question applies to any model.
Steps To Reproduce
No repro steps as this question is theoretical and applies to any model.