I have a question when I am building the engine using tensorRT SDK. Is it better if I set the batch size larger when my GPU memory is enough? In my situation, I receive real-time video data from internet and then do some inference using tensorRT SDK, I can get different process speed when using different batch_size(set when building engine file). So what is the principle when set the value of batch size?
I have a 2080Ti GPU, the gpu memory used by my application(based on tensorRT) is about 3~4g, but the gpu-util is about 80%. I doubt if I can improve the application when I set batch_size larger. I can also take full advantage of my gpu memory.
note: no need to care about the latency of the data from internet when talking about this question in my situation
Thanks for your replies!