TensorRT Python-API using ONNX model calibration , how to write stream code with multiple inputs


I am writing an int8 quantization model for a Pytorch model, CascadePSP, which has two or three inputs, how to write the stream part of the code.
I have successfully calibrated the model with a single input, and the size is only 1/4 of the original ONNX model,with almost no loss of accuracy.


TensorRT Version:
GPU Type: RTX 3080
Nvidia Driver Version: 470.14
CUDA Version: 11.1
CUDNN Version: 8.0.5
Operating System + Version: Windows 10 21343
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):nvcr.io/nvidia/pytorch20.10-py3

The below link might be useful for you
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.


Thank you for your reply, I think you may have missed my point.
I want to quantify the ONNX model, so I need to implement the interface of the calibration dataset of the calibration class, and the general data stream is a single input, I want to know how I can process two inputs in the dataset Dataloader part of the calibration class and copy them to the cuda stream, so that it can be used for In8 calibration.

Hi @851482801,

There is a sample that calibrate on 2 inputs. Hope this will help you.
TensorRT/sampleFasterRCNN.cpp at master · NVIDIA/TensorRT · GitHub.

Thank you.