I am writing an int8 quantization model for a Pytorch model, CascadePSP, which has two or three inputs, how to write the stream part of the code.
I have successfully calibrated the model with a single input, and the size is only 1/4 of the original ONNX model，with almost no loss of accuracy.
TensorRT Version: 188.8.131.52
GPU Type: RTX 3080
Nvidia Driver Version: 470.14
CUDA Version: 11.1
CUDNN Version: 8.0.5
Operating System + Version: Windows 10 21343
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):nvcr.io/nvidia/pytorch20.10-py3