I am writing an int8 quantization model for a Pytorch model, CascadePSP, which has two or three inputs, how to write the stream part of the code.
I have successfully calibrated the model with a single input, and the size is only 1/4 of the original ONNX model,with almost no loss of accuracy.
Environment
TensorRT Version: 7.2.2.3 GPU Type: RTX 3080 Nvidia Driver Version: 470.14 CUDA Version: 11.1 CUDNN Version: 8.0.5 Operating System + Version: Windows 10 21343 Python Version (if applicable): 3.6 PyTorch Version (if applicable): 1.7 Baremetal or Container (if container which image + tag):nvcr.io/nvidia/pytorch20.10-py3
Thank you for your reply, I think you may have missed my point.
I want to quantify the ONNX model, so I need to implement the interface of the calibration dataset of the calibration class, and the general data stream is a single input, I want to know how I can process two inputs in the dataset Dataloader part of the calibration class and copy them to the cuda stream, so that it can be used for In8 calibration.