I want to run an image enhancement network on a very high resolution image with tens of megapixels. I will definitely split the image into overlapping patches to fit into the GPU memory. Which approach allows higher throughput (frame rate)?
1 - Use small patch size (128x128 or 256x256) and large batch size (32 or 64)?
2 - Use large patch size (1024x1024 or 2048x2048) and small batch size (4 or 8)
I am using TensorRT C++ library.
Environment
TensorRT Version: TensorRT-7.2.3.4 GPU Type: NVIDIA GeForce GTX 1660 Ti with Max-Q Design Nvidia Driver Version: 27.21.14.6079 CUDA Version: 11 CUDNN Version: Operating System + Version: Windows 10 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Thanks for your fast reply! The model is a standard U-Net like the one in this link
Fundamentally, is choice 1 better or choice 2?
1 - Use small patch size (128x128 or 256x256) and large batch size (32 or 64)?
2 - Use large patch size (1024x1024 or 2048x2048) and small batch size (4 or 8)
Larger patches and smaller batches may help you, because it allows more scope for larger (higher-efficiency) tiles.
We would strongly suggest to benchmark it and select which one works best. As without knowledge of the network and the device, it is not possible to answer the question.