I am working on an inference application using NVIDIA TensorRT for a model that includes multiple inputs and outputs (e.g., 3 inputs and 5 outputs). During implementation, I observed that the input tensors are listed first in the bindings array, followed by the output tensors. However, I want to confirm if this order is guaranteed by TensorRT by default.
-
- Is it guaranteed by TensorRT that the bindings in the engine are always ordered with inputs listed first, followed by outputs?
-
- If this behavior is documented, could you please provide a reference to confirm this?