How to obtain output for a given batch fed to IInt8Calibrator

user7278133 · August 19, 2022, 8:32pm

Description

I am running post-training quantization for a pre-trained recurrent network model. I followed the sampleINT8 example and implemented a calibrator based on the IInt8Calibrator class (IInt8EntropyCalibrator2 to be more specific). Since my network is recurrent, I’d like to obtain the recurrent output for a given batch and use it for the next batch in getBatch(). It appears that there’s no way to do this with the current C++ API. The work-around is to run the network as is on the calibration dataset, then store the recurrent output for each data instance. Then during calibration, I can load these in for each batch. However, this workflow is cumbersome to maintain. Is there a better way to do so?

Environment

TensorRT Version: 8.2.2.1
GPU Type: RTX 3090
Nvidia Driver Version: 515.65.01
CUDA Version: 11.6
CUDNN Version: 8.4
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

No model is needed as this question applies to any model.

Steps To Reproduce

No repro steps as this question is theoretical and applies to any model.

NVES · August 19, 2022, 9:07pm

Hi, Please refer to the below links to perform inference in INT8

Thanks!

user7278133 · August 19, 2022, 9:24pm

Hi, thanks for the quick reply. I’ve gone through the documentation and the C++ API docs but I haven’t found a solution to the question.

spolisetty · August 30, 2022, 2:48pm

Hi,

I think there is no better way than what you’re already doing currently. or we’d have to modify the calibrator code to save the output of the recurrent nodes or output nodes, then we’d have to make it accessible in getBatch() call.

Thank you.