This is a more R&D topic, I am looking for a way to calculate the max batch size for a GPU, given we know the model size and the GPU memory size, the models that are of primary interest to me are CNN models. From my research I have found that for a single sample the memory usage is the size of the input image * bytes per value (precision) + the size of the model + the size of the forward pass (size of the output of each layers) + size of the backward pass (size of the gradients calculated for each weights in the model), the memory of the GPU divided by this sum should then give us the maximum number of sample the GPU can support. We can calculate the size of the intermediate output layers given we know the model’s layers precision and input size and gradients are calculated for each weights and I guess this would approximate to the model size.
What I am looking for is for scrutiny for my analysis and if there are any approximations you guys use on your end since my calculations can be tedious especially if the model is deep or any tools that can help me with this. Any help is appreciated, thanks.
More reference can be also found in How to determine the largest batch size of a given model saturating the GPU? - deployment - PyTorch Forums and A batch too large: Finding the batch size that fits on GPUs | by Bryan M. Li | Towards Data Science.
These articles are in the context of pytorch, I am using tao3.22.05, are there any calculations and approximations one can do in this context.
The articles imply that experiments are expected to determine the largest batch size. More, I think there is not a conclusion for this R&D topic yet. You can also find some discussion in
machine learning - How to calculate optimal batch size? - Stack Overflow,
python - Tensorflow: on what does the batch_size depend? - Stack Overflow.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.