Can an engine have multiple inputs and outputs? (ICudaEngine::getNbIOTensors())

diederick · June 5, 2023, 2:22pm

Sorry if this is a silly question. I’m scratching the surface of AI and I’ve started exploring the TensorRT C++ API to get my hands dirty. When I deserialize an engine using nvinfer1::IRuntime::deserializeCudaEngine() I can ask the deserialized engine the number of IO tensors using getNbIOTensors(). I understand that that the batch size can be larger than one, but that’s not what getNbIOTensors() returns. With my current underderstanding you can feed multiple batches during inference.

What does it mean when the number of input tensors is larger than one?
What does it mean when the number of output tensors is larger than one?

AakankshaS · June 6, 2023, 7:57am

Hi,

The below links might be useful for you.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

diederick · June 6, 2023, 9:02am

Hi @AakankshaS was this answer for me? Although the links are good to get a general understanding, how do they answer my question? My question wasn’t that general I hope :)

spolisetty · July 28, 2023, 1:34pm

The number of input tensors in a TensorRT engine refers to the number of tensors that are required to be passed to the engine in order to perform inference. If the number of input tensors is larger than one, it means that the engine can be used to perform inference on multiple inputs at the same time. When performing inference, you need to provide data for each of these input tensors separately.

The number of output tensors refers to the number of tensors that are produced by the engine after inference is performed. If the number of output tensors is larger than one, it means that the engine can produce multiple outputs after inference is performed. The model will generate multiple sets of outputs, each corresponding to a different aspect of the model’s predictions or computations.

The batch size is a separate concept from the number of input or output tensors. The batch size refers to the number of inputs or outputs that are processed by the engine at the same time. For example, if the batch size is 2, then the engine will process two inputs or outputs at the same time.