TensorRT Python-API using ONNX model calibration , how to write stream code with multiple inputs

Description

I am writing an int8 quantization model for a Pytorch model, CascadePSP, which has two or three inputs, how to write the stream part of the code.
I have successfully calibrated the model with a single input, and the size is only 1/4 of the original ONNX model,with almost no loss of accuracy.

Environment

TensorRT Version: 7.2.2.3
GPU Type: RTX 3080
Nvidia Driver Version: 470.14
CUDA Version: 11.1
CUDNN Version: 8.0.5
Operating System + Version: Windows 10 21343
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):nvcr.io/nvidia/pytorch20.10-py3

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

Thank you for your reply, I think you may have missed my point.
I want to quantify the ONNX model, so I need to implement the interface of the calibration dataset of the calibration class, and the general data stream is a single input, I want to know how I can process two inputs in the dataset Dataloader part of the calibration class and copy them to the cuda stream, so that it can be used for In8 calibration.

Hi @851482801,

There is a sample that calibrate on 2 inputs. Hope this will help you.
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleFasterRCNN/sampleFasterRCNN.cpp#L272.

Thank you.