Computer architecture for heavy deep learning algorithm?

I’m trying to configure what is the best architecture for my machine in order to be able to train a neural net with ~15 million parameters that gets as an input a block of 50X512X512 float32. I would like to run it in a decent time. I know that reading the data from the disk decelerates the training and I wonder how to deal with this also.
I would appreciate your reference to the next items:

  1. how do I know what is the minimal size of GPU memory/frequency to carry such calculation.
  2. what kind of SSD and CPU are preferred?
  3. should I use one big GPU or few decent GPUs?

I’m using keras with tensorflow, on python 3.7.
the operating system is windows.

when executing today on a computer with 3 gpus of 11 Gi (2080 Ti) and Ram of 128 Gi,with cpu i9 -9940x, 3.3GHz.
we can train only with 3 input blocks, one on each gpu in minibatch. and each minibatch takes 2-3 sec.