In this trt document, I could find the discription about the way of set number of aux stream.
Additionally, according to the description of setMaxAuxStreams, I found trt automatically set the optimal number of aux streams if I don’t set the number of aux streams.
However, I built the engine without setting the number of aux streams and trying to getNbAuxStreams, the result of getNbAuxStreams().
Can I get the default number of aux streams?
//!
//! \brief Set the maximum number of auxiliary streams that TRT is allowed to use.
//!
//! If the network contains operators that can run in parallel, TRT can execute them using auxiliary streams
//! in addition to the one provided to the IExecutionContext::enqueueV3() call.
//!
//! The default maximum number of auxiliary streams is determined by the heuristics in TensorRT on whether enabling
//! multi-stream would improve the performance. This behavior can be overridden by calling this API to set the
//! maximum number of auxiliary streams explicitly. Set this to 0 to enforce single-stream inference.
//!
//! The resulting engine may use fewer auxiliary streams than the maximum if the network does not contain enough
//! parallelism or if TensorRT determines that using more auxiliary streams does not help improve the performance.
//!
//! \note Allowing more auxiliary streams does not always give better performance since there will be
//! synchronizations overhead between streams. Using CUDA graphs at runtime can help reduce the overhead caused by
//! cross-stream synchronizations.
//!
//! \note Using more auxiliary leads to more memory usage at runtime since some activation memory blocks will not
//! be able to be reused.
//!
//! \param nbStreams The maximum number of auxiliary streams that TRT is allowed to use.
//!
//! \see getMaxAuxStreams(), ICudaEngine::getNbAuxStreams(), IExecutionContext::setAuxStreams()
//!
void setMaxAuxStreams(int32_t nbStreams) noexcept
{
mImpl->setMaxAuxStreams(nbStreams);
}