Memory allocation in TAO

I performed a series of experiments to find out how much GPU memory the TAO models need per batch size. But I noticed that TAO models consume a certain amount of memory that is not dependent on the batch size.
Could you please explain how GPU memory is allocated in TAO?

It is not a specific topic for TAO. Usually the batch size has a huge impact on the required GPU memory for training a neural network. Besides that, the parameters(weights and biases) of the model, the temporary memory for local variables, the optimizer’s variables and intermediate calculations will also consume GPU memory.

