Train on large image resolution with multi-gpu

c.renaudineau · July 21, 2021, 10:41am

Hello,

I need to train a neural network on large image resolution (>= 2000 * 2000). The problem is with a A100 40Go I can only train in 900 * 900 resolution. Even with 4 * A100 40 Go, each GPU needs to support 1 batch. So I am always limited with 900 * 900 resolution.

How I can train on large image resolution without crop / resize or patch my dataset ?

NVlink could do this ? Or horovold ? Or the maximum resolution is limited by the current GPU RAM, like A100 80Go ? We can’t make a 1 batch fit more 80Go rigth now ?

Thanks for your response.

nadeemm · July 21, 2021, 11:26pm

Training is often done at the lower resolutions - can you share a little more information about your project, and the dataset that you are using and what is driving the requirement for the high resolution - and any other information you think may help us - help you. If the reason is that your data set is high-res - then a pre-processing transformation step may be an option.
Many Thanks,

c.renaudineau · July 22, 2021, 6:57am

We are using the SPADE framework from NVidia, and we need to generate high resolution images. Pre-processing will reduce the quality of the generated image.

Create a patch also. The generator can’t make a spatial consistency image.

For that, we are looking for the best way to create 1 batch with the maximum resolution. Are we limited to the RAM on 1 GPU ? Are we limited to 80Go (A100 80Go) ?

Thanks for your reply.

Best regards,

Topic		Replies	Views
Errors in the training model when batch_size_per_gpu is modified to be greater than 4 TAO Toolkit	5	630	October 12, 2021
Custom sized images for training Deep Learning (Training & Inference)	0	391	September 12, 2018
Memory Issue Computer Vision & Image Processing gpu	0	467	June 29, 2022
Possible to train faster rcnn in batch? TAO Toolkit	5	703	October 12, 2021
Does number of samples affect the GPU memory? Feature Requests	0	602	June 8, 2022
StyleGAN resolution 2048x2048 CUDA Developer Tools	0	615	December 15, 2020
Train faster-rcnn with multiple images per iteration TAO Toolkit	4	941	October 12, 2021
What is recommended system configuration for Deep Learning? Deep Learning (Training & Inference)	0	373	April 19, 2020
Couldnt utilize full machine resources while training TAO Toolkit tao	3	14	August 27, 2024
Computer architecture for heavy deep learning algorithm? Deep Learning (Training & Inference)	0	271	February 10, 2021

Train on large image resolution with multi-gpu

Related topics