Description
A clear and concise description of the bug or issue.
Environment
**TensorRT Version: 8.6.1
**GPU Type: RTX3070Ti
**Nvidia Driver Version: 3.28.0.417
**CUDA Version: 11.1
**CUDNN Version: 11.3
**Operating System + Version: Win10
**C++ Version : C++ 14
**Baremetal or Container (if container which image + tag):
Hey:
I use TensorRt to infer. net is convNext,fp32 ,batch is100, input size [100,3,128,128], and when I use one stream, the GPU utilization is around 5%. I want to accelerate it by using multiple streams and contexts. When the context is set to 6, the GPU utilization is around 80%, but the running time remains the same. multi context use one engin.
The problem
- Why is the GPU utilization so low for batch size 100?
- Why don’t multi stream and multi context improve efficiency?
Thanks.
`