There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
It is not a specific topic for TAO. Usually the batch size has a huge impact on the required GPU memory for training a neural network. Besides that, the parameters(weights and biases) of the model, the temporary memory for local variables, the optimizer’s variables and intermediate calculations will also consume GPU memory.