Description
I want to run an image enhancement network on a very high resolution image with tens of megapixels. I will definitely split the image into overlapping patches to fit into the GPU memory. Which approach allows higher throughput (frame rate)?
1 - Use small patch size (128x128 or 256x256) and large batch size (32 or 64)?
2 - Use large patch size (1024x1024 or 2048x2048) and small batch size (4 or 8)
I am using TensorRT C++ library.
Environment
TensorRT Version: TensorRT-7.2.3.4
GPU Type: NVIDIA GeForce GTX 1660 Ti with Max-Q Design
Nvidia Driver Version: 27.21.14.6079
CUDA Version: 11
CUDNN Version:
Operating System + Version: Windows 10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered