hello, in page https://developer.nvidia.com/tensorrt, it introduces that TensorRT has a few
Optimizations and Performance. I do not understand the
Kernel Auto-Tuning and
Multi-Stream Execution :
Selects best data layers and algorithms based on target GPU platformmeans what?
Scalable design to process multiple input streams in parallel, this stream is cudaStream?
can you give some detailed explains or url with detailed materials? I can not find some informations in https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide